Reputation:
I have a dataframe as below:
data = {('5105', 'Open'): [1.99,1.98,1.99,2.05,2.15],
('5105', 'Adj Close'): [1.92,1.92,1.96,2.07,2.08],
('5229', 'Open'): [0.01]*5,
('5229', 'Adj Close'): [0.02]*5,
('7076', 'Open'): [1.02,1.01,1.01,1.06,1.06],
('7076', 'Adj Close'): [0.90,0.92,0.94,0.94,0.95]}
df = pd.DataFrame(data)
5105 5229 7076
Open Adj Close Open Adj Close Open Adj Close
0 1.99 1.92 0.01 0.02 1.02 0.90
1 1.98 1.92 0.01 0.02 1.01 0.92
2 1.99 1.96 0.01 0.02 1.01 0.94
3 2.05 2.07 0.01 0.02 1.06 0.94
4 2.15 2.08 0.01 0.02 1.06 0.95
As the dataframe above, we can see that df['5229']
has both columns Open
and Adj Close
having the same values respectively throughout the column. So, I intend to drop it since it will not be useful in my analysis.
I have two queries:
As this is a conditional-based dropping, I was wondering if df.drop
still works in this case?
Based on my 1st and 2nd query, in my case above, since the Open
and Adj Close
are having same values throughout the column, I would like to drop it entirely.
The expected output is:
5105 7076
Open Adj Close Open Adj Close
0 1.99 1.92 1.02 0.90
1 1.98 1.92 1.01 0.92
2 1.99 1.96 1.01 0.94
3 2.05 2.07 1.06 0.94
4 2.15 2.08 1.06 0.95
Really thank you for those answering the question. Just to be more concise, I was trying to drop the columns from the dataframe consisting of more than 200 columns given the condition if all the values in that particular column are the same.
Upvotes: 0
Views: 111
Reputation: 323226
Try with nunique
df = df.loc[:,~(df.nunique()==1).values]
Out[125]:
5105 7076
Open Adj Close Open Adj Close
0 1.99 1.92 1.02 0.90
1 1.98 1.92 1.01 0.92
2 1.99 1.96 1.01 0.94
3 2.05 2.07 1.06 0.94
4 2.15 2.08 1.06 0.95
Upvotes: 1
Reputation:
We could use unstack
+ groupby
+ nunique
to get the number of unique values in each column. Then select only the columns with more than 1 value by the loc
:
out = df[df.unstack().groupby(level=[0,1]).nunique().loc[lambda x: x!=1].index]
Output:
5105 7076
Adj Close Open Adj Close Open
0 1.92 1.99 0.90 1.02
1 1.92 1.98 0.92 1.01
2 1.96 1.99 0.94 1.01
3 2.07 2.05 0.94 1.06
4 2.08 2.15 0.95 1.06
Upvotes: 0
Reputation: 571
you can try this:
for a, b in df.columns:
if df[a][b].duplicated(keep=False).sum() == df[a][b].size:
df.drop((a, b), axis=1, inplace=True)
Result:
5105 7076
Open Adj Close Open Adj Close
0 1.99 1.92 1.02 0.90
1 1.98 1.92 1.01 0.92
2 1.99 1.96 1.01 0.94
3 2.05 2.07 1.06 0.94
4 2.15 2.08 1.06 0.95
Upvotes: 0
Reputation: 8768
Try this:
df.drop('5229',level=0,axis=1)
Output:
5105 7076
Open Adj Close Open Adj Close
0 1.99 1.92 1.02 0.90
1 1.98 1.92 1.01 0.92
2 1.99 1.96 1.01 0.94
3 2.05 2.07 1.06 0.94
4 2.15 2.08 1.06 0.95
Upvotes: 0