Reputation: 97
I am trying to clean outliers in dataset with the loop:
df_cleaned = pd.DataFrame()
for grade in df.Grade.unique():
df_per_grade = df[df['Grade']==grade]
for column in df_per_grade.columns[5:-2]:
df_clean = pd.DataFrame(df_per_grade['Grade', 'Shift'])
df_clean[column] = df_per_grade[column][df_per_grade[column].between(df_per_grade[column].quantile(0.05),
df_per_grade[column].quantile(0.95))]
df_cleaned = df_cleaned.append(df_clean)
the problem is that it is returning me data set that look like this:
Index | Grade | Shift | Column 1 | Column 2 |
0 P1 1 5 NaN
1 P1 1 3 NaN
2 P2 1 2 NaN
3 P2 1 1 NaN
4 P2 1 2 NaN
0 P1 1 NaN 7
1 P1 1 NaN 9
2 P2 1 NaN 9
3 P2 1 NaN 7
4 P2 1 NaN 5
And I would like want it too look like this:
Index | Grade | Shift | Column 1 | Column 2 |
0 P1 1 5 7
1 P1 1 3 9
2 P2 1 2 9
3 P2 1 1 7
4 P2 1 2 5
Upvotes: 0
Views: 193
Reputation: 697
Do not use append on the last code line. Use merge instead :
df_cleaned = df_cleaned.merge(df_clean)
Upvotes: 1