I have a dataframe as below: data = {('5105', 'Open'): [1.99,1.98,1.99,2.05,2.15], ('5105', 'Adj Close'): [1.92,1.92,1.96,2.07,2.08], ('5229', 'Open'): [0.01]*5, ('5229', 'Adj Close'): [0.02]*5, ('7076', 'Open'): [1.02,1.01,1.01,1.06,1.06], ('7076', 'Adj Close'): [0.90,0.92,0.94,0.94,0.95]} df = pd.DataFrame(data) 5105 5229 7076 Open Adj Close Open Adj Close Open Adj Close 0 1.99 1.92 0.01 0.02 1.02 0.90 1 1.98 1.92 0.01 0.02 1.01 0.92 2 1.99 1.96 0.01 0.02 1.01 0.94 3 2.05 2.07 0.01 0.02 1.06 0.94 4 2.15 2.08 0.01 0.02 1.06 0.95 As the dataframe above, we can see that df['5229'] has both columns Open and Adj Close having the same values respectively throughout the column. So, I intend to drop it since it will not be useful in my analysis. I have two queries: How do I drop the column on level 0 (that is the 1st column) if its subcolumns have the same values respectively throughout the column? On the other hand, if there's just one subcolumn that has the same values throughout the column, how can I drop it? As this is a conditional-based dropping, I was wondering if df.drop still works in this case? Based on my 1st and 2nd query, in my case above, since the Open and Adj Close are having same values throughout the column, I would like to drop it entirely. The expected output is: 5105 7076 Open Adj Close Open Adj Close 0 1.99 1.92 1.02 0.90 1 1.98 1.92 1.01 0.92 2 1.99 1.96 1.01 0.94 3 2.05 2.07 1.06 0.94 4 2.15 2.08 1.06 0.95 Edit Really thank you for those answering the question. Just to be more concise, I was trying to drop the columns from the dataframe consisting of more than 200 columns given the condition if all the values in that particular column are the same.

pythonpython-3.xpandasdataframemulti-index

Reputation:

Pandas: Conditionally dropping columns based on same values throughout the column in MultiIndex dataframe

I have a dataframe as below:

data = {('5105', 'Open'): [1.99,1.98,1.99,2.05,2.15],
        ('5105', 'Adj Close'): [1.92,1.92,1.96,2.07,2.08],
        ('5229', 'Open'): [0.01]*5,
        ('5229', 'Adj Close'): [0.02]*5,
        ('7076', 'Open'): [1.02,1.01,1.01,1.06,1.06],
        ('7076', 'Adj Close'): [0.90,0.92,0.94,0.94,0.95]}

df = pd.DataFrame(data)

   5105            5229            7076          
   Open Adj Close  Open Adj Close  Open Adj Close
0  1.99      1.92  0.01      0.02  1.02      0.90
1  1.98      1.92  0.01      0.02  1.01      0.92
2  1.99      1.96  0.01      0.02  1.01      0.94
3  2.05      2.07  0.01      0.02  1.06      0.94
4  2.15      2.08  0.01      0.02  1.06      0.95

As the dataframe above, we can see that df['5229'] has both columns Open and Adj Close having the same values respectively throughout the column. So, I intend to drop it since it will not be useful in my analysis.

I have two queries:

How do I drop the column on level 0 (that is the 1st column) if its subcolumns have the same values respectively throughout the column?
On the other hand, if there's just one subcolumn that has the same values throughout the column, how can I drop it?

As this is a conditional-based dropping, I was wondering if df.drop still works in this case?

Based on my 1st and 2nd query, in my case above, since the Open and Adj Close are having same values throughout the column, I would like to drop it entirely.

The expected output is:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Edit

Really thank you for those answering the question. Just to be more concise, I was trying to drop the columns from the dataframe consisting of more than 200 columns given the condition if all the values in that particular column are the same.

Upvotes: 0

Answers (4)

BENY

Reputation: 323226

Try with nunique

df = df.loc[:,~(df.nunique()==1).values]
Out[125]: 
   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Upvotes: 1

user7864386

Reputation:

We could use unstack + groupby + nunique to get the number of unique values in each column. Then select only the columns with more than 1 value by the loc:

out = df[df.unstack().groupby(level=[0,1]).nunique().loc[lambda x: x!=1].index]

Output:

       5105            7076      
  Adj Close  Open Adj Close  Open
0      1.92  1.99      0.90  1.02
1      1.92  1.98      0.92  1.01
2      1.96  1.99      0.94  1.01
3      2.07  2.05      0.94  1.06
4      2.08  2.15      0.95  1.06

Upvotes: 0

caiolopes

Reputation: 571

you can try this:

for a, b in df.columns:
    if df[a][b].duplicated(keep=False).sum() == df[a][b].size:
        df.drop((a, b), axis=1, inplace=True)

Result:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Upvotes: 0

rhug123

Reputation: 8768

Try this:

df.drop('5229',level=0,axis=1)

Output:

   5105            7076          
   Open Adj Close  Open Adj Close
0  1.99      1.92  1.02      0.90
1  1.98      1.92  1.01      0.92
2  1.99      1.96  1.01      0.94
3  2.05      2.07  1.06      0.94
4  2.15      2.08  1.06      0.95

Upvotes: 0

Pandas: Conditionally dropping columns based on same values throughout the column in MultiIndex dataframe

Edit

Answers (4)

Related Questions