Adding New Columns with pandas
Since the Boston Housing Prices dataset does not contain a meaningful categorical variable, we can create a new column for feature engineering using the existing numerical features. Let's create a new column named "Age_Category" based on the "AGE" feature. We'll group the data into three age categories: "Young", "Middle-aged", and "Old". The age category ranges are arbitrary for demonstration purposes.
In this example, we used the pd.cut() function from Pandas to create the new "Age_Category" column based on the "AGE" feature. We defined the age bins as [0, 30, 60, float('inf')], corresponding to "Young", "Middle-aged", and "Old" age categories, respectively. The labels assigned to these categories are 'Young', 'Middle-aged', and 'Old'. The pd.cut() function then categorizes each value in the "AGE" column into the appropriate age category based on the bins and labels.
This new "Age_Category" column can now be used as a categorical variable for further analysis, visualization, or modeling purposes. You can similarly create other meaningful columns for feature engineering based on the existing features in the dataset.
let's create a new numerical variable named "Price_per_Room" that represents the average housing price per room in each property. This will provide additional insight into the pricing of houses based on the number of rooms they have.
In this example, we calculated the "Price_per_Room" by dividing the "PRICE" column (housing price) by the "RM" column (number of rooms) for each property. This creates a new column that indicates the average housing price per room in the property. This numerical variable can provide valuable insights into how the housing prices are affected by the number of rooms available.
The "Price_per_Room" column can now be used as a continuous numerical feature for further analysis, visualization, or modeling. It gives a different perspective on the relationship between housing prices and the number of rooms, which may provide more information for your analysis and decision-making.
Adding New Rows with pandas
Let's assume the new row data as follows:
There can be many ways to add a row to the dataframe using .append(), .concat(), iloc[], and loc[].
Using .append()
Using .concat()
Using .iloc[]
Using .loc[]
https://ai-fin-tech.tistory.com/entry/Subsetting-Rows-with-Categorical-Variables
Subsetting Rows with Categorical Variables
Data Import Since there is no categorical variables in Boston dataset, I will just show you the example using dummy dataset. Let's consider a hypothetical dataset called "employee_data" with a categorical variable "Department" and other numerical features.
ai-fin-tech.tistory.com
https://ai-fin-tech.tistory.com/entry/Complete-Usage-of-loc-and-iloc-with-pandas
Complete Usage of loc and iloc with pandas
Data Import import pandas as pd from sklearn.datasets import load_boston # Load the Boston Housing Prices dataset boston = load_boston() boston_df = pd.DataFrame(boston.data, columns=boston.feature_names) boston_df['PRICE'] = boston.target A complete usage
ai-fin-tech.tistory.com
'Tech > Python' 카테고리의 다른 글
Data Handling - fit, transform, and fit_transform (8) | 2023.08.03 |
---|---|
Data Handling - Data Type (2) | 2023.08.03 |
pandas - Subsetting Rows with Categorical Variables (1) | 2023.07.30 |
pandas - Complete Usage of loc and iloc (1) | 2023.07.30 |
pandas - Subsetting Columns and Rows (1) | 2023.07.30 |
댓글