본문 바로가기
Tech/Python

pandas - Subsetting Rows with Categorical Variables

by Jyubaeng2 2023. 7. 30.

Data Import

Since there is no categorical variables in Boston dataset, I will just show you the example using dummy dataset.

Let's consider a hypothetical dataset called "employee_data" with a categorical variable "Department" and other numerical features. We will use this dataset to subset rows based on the "Department" category.

 

# Sample employee data with a categorical variable "Department"
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Ella'],
    'Age': [25, 30, 22, 27, 28],
    'Department': ['HR', 'IT', 'Finance', 'HR', 'IT'],
    'Salary': [50000, 60000, 45000, 55000, 58000]
}

employee_data = pd.DataFrame(data)

 

Subsetting rows based on the categorical variable "Department"

# Subsetting rows based on the categorical variable "Department"
hr_department = employee_data[employee_data['Department'] == 'HR']
it_department = employee_data[employee_data['Department'] == 'IT']
finance_department = employee_data[employee_data['Department'] == 'Finance']

print("HR Department:")
print(hr_department)

print("\nIT Department:")
print(it_department)

print("\nFinance Department:")
print(finance_department)

 

Filtering Categorical Variables using .isin()

Below is an example of using .isin() instead.

# Subsetting rows based on the categorical variable "Department" using isin() function
selected_departments = employee_data[employee_data['Department'].isin(['HR', 'IT'])]
finance_department = employee_data[employee_data['Department'].isin(['Finance'])]

print("HR and IT Departments:")
print(selected_departments)

print("\nFinance Department:")
print(finance_department)

 

 

https://ai-fin-tech.tistory.com/entry/Adding-New-Columns-and-Rows-to-DataFrame-with-pandas

 

Adding New Columns and Rows to DataFrame with pandas

Adding New Columns with pandas Since the Boston Housing Prices dataset does not contain a meaningful categorical variable, we can create a new column for feature engineering using the existing numerical features. Let's create a new column named "Age_Catego

ai-fin-tech.tistory.com

 

'Tech > Python' 카테고리의 다른 글

Data Handling - Data Type  (2) 2023.08.03
pandas - Adding New Columns and Rows to DataFrame  (4) 2023.07.30
pandas - Complete Usage of loc and iloc  (1) 2023.07.30
pandas - Subsetting Columns and Rows  (1) 2023.07.30
pandas - Sorting DataFrame  (1) 2023.07.30

댓글