xkcd
xkcd

Reputation: 472

How to drop a column in a Pandas DataFrame which contains the same value

I have a Pandas DataFrame with some columns that have the same value in every row.

So something like:-

Col1    Col2     Col3 ....  ColX  ColY    ColZ
323     False    324          4    abc    Sync 
232     False    342          4    def    Sync
364     False    2343         4    ghi    Sync

So I would like to drop Col2, ColX and ColZ from the above DataFrame.

Upvotes: 3

Views: 4496

Answers (3)

rachwa
rachwa

Reputation: 2310

You can use nunique and columns, to obtain the column names with more than one unique value:

In [6]: df[df.columns[df.nunique() > 1]]
Out[6]: 
   Col1  Col3 ColY
0   323   324  abc
1   232   342  def
2   364  2343  ghi

Upvotes: 1

user2285236
user2285236

Reputation:

You can compare the DataFrame against a particular row (I chose the first one with df.iloc[0]) and use loc to select the columns that satisfy the condition you specified:

df.loc[:, ~(df == df.iloc[0]).all()]
Out: 
   Col1  Col3 ColY
0   323   324  abc
1   232   342  def
2   364  2343  ghi

Timings:

@root's suggestion, nunique, is quite faster than comparing the Series against a single value. Unless you have a huge number of columns (thousands, for example) iterating over columns as @MMF suggested looks like a more efficient approach.

df = pd.concat([df]*10**5, ignore_index=True)

%timeit df.loc[:, ~(df == df.iloc[0]).all()]
1 loop, best of 3: 377 ms per loop

%timeit df[[col for col in df if not df[col].nunique()==1]]
10 loops, best of 3: 35.6 ms per loop


df = pd.concat([df]*10, axis=1, ignore_index=True)

%timeit df.loc[:, ~(df == df.iloc[0]).all()]
1 loop, best of 3: 3.71 s per loop

%timeit df[[col for col in df if not df[col].nunique()==1]]
1 loop, best of 3: 353 ms per loop


df = pd.concat([df]*3, axis=1, ignore_index=True)

%timeit df.loc[:, ~(df == df.iloc[0]).all()]
1 loop, best of 3: 11.3 s per loop

%timeit df[[col for col in df if not df[col].nunique()==1]]
1 loop, best of 3: 1.06 s per loop

Upvotes: 7

MMF
MMF

Reputation: 5921

You can also do this checking the length of the set generated by the values of each column :

df = df[[col for col in df if not len(set(df[col]))==1]]

Upvotes: 6

Related Questions