Reputation: 101
This is my dataset:
df = {'brand_no':['BH 1', 'BH 2', 'BH 5', 'BH 7', 'BH 6'],
'1240000601_min':[5.87,5.87,5.87,5.87,np.nan],
'1240000601_max':[8.87,7.47,10.1,1.9,10.8],
'1240000603_min':[5.87,np.nan,6.5,2.0,7.8],
'1240000603_max':[8.57,7.47,10.2,1.0,10.2],
'1240000604_min':[5.87,5.67,6.9,1.0,7.8],
'1240000604_max':[8.87,8.87,8.87,np.nan,8.87],
'1240000605_min':[15.87,15.67,16.9,1.0,17.8],
'1240000605_max':[18.11,17.47,20.1,1.9,22.6],
'1240000606_min':[8.12,8.12,np.nan,8.12,np.nan],
'1240000606_max;':[np.nan,7.47,10.1,1.9,np.nan]}
# Create DataFrame
df = pd.DataFrame(df)
# Print the output.
df
As you can see the column stays the same except nan.(Because the data is sparse it has nan as well), so I want drop these column which has value same across all the rows. (in this case column 1240000601_min, 1240000604_max and 1240000606_min)
As we can see here, all the column with same value across all rows are dropped. Pls help get this.
Upvotes: 2
Views: 694
Reputation: 23217
You can use: df.nunique()
to check for number of unique items in all columns and filter those > 1 with .gt(1)
. This will form a boolean mask of the columns. Then, use .loc
and put the boolean mask just created on the second parameter of .loc
to filter the columns:
df_cleaned = df.loc[:, df.nunique().gt(1)]
Result:
pritn(df_cleaned)
brand_no 1240000601_max 1240000603_min 1240000603_max 1240000604_min 1240000605_min 1240000605_max 1240000606_max;
0 BH 1 8.87 5.87 8.57 5.87 15.87 18.11 NaN
1 BH 2 7.47 NaN 7.47 5.67 15.67 17.47 7.47
2 BH 5 10.10 6.50 10.20 6.90 16.90 20.10 10.10
3 BH 7 1.90 2.00 1.00 1.00 1.00 1.90 1.90
4 BH 6 10.80 7.80 10.20 7.80 17.80 22.60 NaN
Upvotes: 1
Reputation: 1552
You can use something like this:
columns = [column for column in df.columns if df[column].nunique()==1]
df.drop(columns=columns)
df.nunique() drops nans by default, so you don't have to worry about that.
Upvotes: 1