Reputation: 101

Filter out the column having same value throughout all the rows

This is my dataset:

df = {'brand_no':['BH 1', 'BH 2', 'BH 5', 'BH 7', 'BH 6'],
       '1240000601_min':[5.87,5.87,5.87,5.87,np.nan],
       '1240000601_max':[8.87,7.47,10.1,1.9,10.8],
      '1240000603_min':[5.87,np.nan,6.5,2.0,7.8],
       '1240000603_max':[8.57,7.47,10.2,1.0,10.2],
      '1240000604_min':[5.87,5.67,6.9,1.0,7.8],
       '1240000604_max':[8.87,8.87,8.87,np.nan,8.87],
      '1240000605_min':[15.87,15.67,16.9,1.0,17.8],
       '1240000605_max':[18.11,17.47,20.1,1.9,22.6],
      '1240000606_min':[8.12,8.12,np.nan,8.12,np.nan],
       '1240000606_max;':[np.nan,7.47,10.1,1.9,np.nan]}
# Create DataFrame
df = pd.DataFrame(df)
# Print the output.
df

As you can see the column stays the same except nan.(Because the data is sparse it has nan as well), so I want drop these column which has value same across all the rows. (in this case column 1240000601_min, 1240000604_max and 1240000606_min)

Desired output:

As we can see here, all the column with same value across all rows are dropped. Pls help get this.

Upvotes: 2

Answers (3)

SeaBean

Reputation: 23217

You can use: df.nunique() to check for number of unique items in all columns and filter those > 1 with .gt(1). This will form a boolean mask of the columns. Then, use .loc and put the boolean mask just created on the second parameter of .loc to filter the columns:

df_cleaned = df.loc[:, df.nunique().gt(1)]

Result:

pritn(df_cleaned)

  brand_no  1240000601_max  1240000603_min  1240000603_max  1240000604_min  1240000605_min  1240000605_max  1240000606_max;
0     BH 1            8.87            5.87            8.57            5.87           15.87           18.11              NaN
1     BH 2            7.47             NaN            7.47            5.67           15.67           17.47             7.47
2     BH 5           10.10            6.50           10.20            6.90           16.90           20.10            10.10
3     BH 7            1.90            2.00            1.00            1.00            1.00            1.90             1.90
4     BH 6           10.80            7.80           10.20            7.80           17.80           22.60              NaN

Upvotes: 1

okrn

Reputation: 54

Try this:

df_cleared = df.loc[:, df.nunique() > 1]

Upvotes: 1

Carlos Melus

Reputation: 1552

You can use something like this:

columns = [column for column in df.columns if df[column].nunique()==1]
df.drop(columns=columns)

df.nunique() drops nans by default, so you don't have to worry about that.

Upvotes: 1

Filter out the column having same value throughout all the rows

Answers (3)

Related Questions