본문 바로가기
Tech/Python

pandas - Complete Usage of loc and iloc

by Jyubaeng2 2023. 7. 30.

Data Import

import pandas as pd
from sklearn.datasets import load_boston

# Load the Boston Housing Prices dataset
boston = load_boston()
boston_df = pd.DataFrame(boston.data, columns=boston.feature_names)
boston_df['PRICE'] = boston.target

 

A complete usage of loc

The .loc[] function in Pandas is a powerful tool for data indexing and selection. With the Boston Housing Prices dataset, you can use .loc[] in various ways to perform data manipulations. Here are all possible usages of .loc[] with the dataset:

 

# Usage of loc
# 1. Selecting specific rows and columns by labels
subset_1 = boston_df.loc[[0, 5, 10], ['CRIM', 'RM', 'AGE', 'PRICE']]

# 2. Selecting all rows for specific columns by labels
subset_2 = boston_df.loc[:, ['CRIM', 'RM', 'AGE', 'PRICE']]

# 3. Selecting all rows for a single column by label
subset_3 = boston_df.loc[:, 'PRICE']

# 4. Selecting a single row by label
subset_4 = boston_df.loc[10]

# 5. Selecting rows based on a condition for a specific column
subset_5 = boston_df.loc[boston_df['CRIM'] < 1]

# 6. Selecting rows based on multiple conditions for different columns
subset_6 = boston_df.loc[
    (boston_df['CRIM'] < 1) &
    (boston_df['RM'] > 6) &
    (boston_df['AGE'] < 50)
]

# 7. Selecting specific rows and all columns using a slice for row labels
subset_7 = boston_df.loc[10:15, :]

# 8. Selecting specific rows and all columns using a slice for column labels
subset_8 = boston_df.loc[:, 'CRIM':'AGE']

# 9. Selecting specific rows and specific columns using slices for both row and column labels
subset_9 = boston_df.loc[10:15, 'CRIM':'AGE']

# 10. Modifying values in specific rows and columns using loc[]
boston_df.loc[boston_df['CRIM'] < 1, 'PRICE'] = 50

# 11. Selecting rows and specific columns using a boolean mask
mask = boston_df['CRIM'] < 1
subset_11 = boston_df.loc[mask, ['CRIM', 'RM', 'PRICE']]

# 12. Combining loc[] with query() for complex filtering
subset_12 = boston_df.loc[
    boston_df.query("CRIM < 1 and RM > 6").index,
    ['CRIM', 'RM', 'PRICE']
]

# 13. Selecting rows based on specific index labels
subset_13 = boston_df.loc[[0, 5, 10]]

# 14. Selecting rows based on a condition and specific columns
subset_14 = boston_df.loc[boston_df['CRIM'] < 1, ['CRIM', 'RM', 'PRICE']]

# 15. Modifying a single cell value using loc[]
boston_df.loc[0, 'PRICE'] = 25

# 16. Selecting rows and specific columns based on complex conditions
subset_16 = boston_df.loc[
    (boston_df['CRIM'] < 1) & (boston_df['RM'] > 6),
    ['CRIM', 'RM', 'PRICE']
]

 

A complete usage of iloc

The .iloc[] function in Pandas is used for integer-location based indexing, which means you can use integer positions to select rows and columns from a DataFrame. Here are all possible usages of .iloc[] with the Boston Housing Prices dataset:

# Usage of iloc
# 1. Selecting a single row by integer index
subset_1 = boston_df.iloc[5]

# 2. Selecting multiple rows by integer index
subset_2 = boston_df.iloc[[0, 5, 10]]

# 3. Selecting specific rows and columns by integer index
subset_3 = boston_df.iloc[[0, 5, 10], [0, 5, -1]]

# 4. Selecting all rows for specific columns by integer index
subset_4 = boston_df.iloc[:, [0, 5, -1]]

# 5. Selecting a single cell value by integer index
cell_value = boston_df.iloc[0, 3]

# 6. Modifying a single cell value using iloc[]
boston_df.iloc[0, 3] = 0.5

# 7. Selecting rows and specific columns using integer slices
subset_7 = boston_df.iloc[10:20, 2:6]

# 8. Selecting rows and all columns using integer slices
subset_8 = boston_df.iloc[10:20, :]

# 9. Selecting all rows and specific columns using integer slices
subset_9 = boston_df.iloc[:, 2:6]

# 10. Selecting rows using a boolean mask with iloc[]
mask = (boston_df['CRIM'] < 1) & (boston_df['RM'] > 6)
subset_10 = boston_df.iloc[mask.values]

# 11. Selecting rows and specific columns using integer positions
subset_11 = boston_df.iloc[[0, 5, 10], [0, 5, -1]]

# 12. Modifying values in specific rows and columns using iloc[]
boston_df.iloc[[0, 5, 10], [0, 5, -1]] = 100

# 13. Combining iloc[] with query() for complex filtering
subset_13 = boston_df.iloc[
    boston_df.query("CRIM < 1 and RM > 6").index,
    [0, 5, -1]
]

# 14. Selecting specific columns using integer positions
subset_14 = boston_df.iloc[:, [0, 5, -1]]

# 15. Selecting a single column by integer position
subset_15 = boston_df.iloc[:, 3]

# 16. Modifying a single column using iloc[]
boston_df.iloc[:, 3] = 50

https://ai-fin-tech.tistory.com/entry/Subsetting-Rows-with-Categorical-Variables

 

Subsetting Rows with Categorical Variables

Data Import Since there is no categorical variables in Boston dataset, I will just show you the example using dummy dataset. Let's consider a hypothetical dataset called "employee_data" with a categorical variable "Department" and other numerical features.

ai-fin-tech.tistory.com

 

댓글