Tokyo
Tokyo

Reputation: 823

Drop columns in pandas dataframe based on conditions

Assume that I have the following dataframe:

+---+---------+------+------+------+
|   | summary | col1 | col2 | col3 |
+---+---------+------+------+------+
| 0 | count   | 10   | 10   | 10   |
+---+---------+------+------+------+
| 1 | mean    | 4    | 5    | 5    |
+---+---------+------+------+------+
| 2 | stddev  | 3    | 3    | 3    |
+---+---------+------+------+------+
| 3 | min     | 0    | -1   | 5    |
+---+---------+------+------+------+
| 4 | max     | 100  | 56   | 47   |
+---+---------+------+------+------+

How can I keep only the columns where count > 5, mean>4 and min>0 including the column summary as well?

The desired output is:

+---+---------+------+
|   | summary | col3 |
+---+---------+------+
| 0 | count   | 10   |
+---+---------+------+
| 1 | mean    | 5    |
+---+---------+------+
| 2 | stddev  | 3    |
+---+---------+------+
| 3 | min     | 5    |
+---+---------+------+
| 4 | max     | 47   | 
+---+---------+------+

Upvotes: 1

Views: 933

Answers (5)

piRSquared
piRSquared

Reputation: 294218

General thrashing about plus query

(
    df.set_index('summary')
      .rename(str.title).T
      .query('Count > 5 & Mean > 4 and Min > 0')
      .T.rename(str.lower)
      .reset_index()
)

  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

Shenanigans

(
    df[['summary']].join(
        df.iloc[:, 1:].loc[:, df.iloc[[0, 1, 3], 1:].T.gt([5, 4, 0]).all(1)]
    )
)
  summary  col3
0   count    10
1    mean     5
2  stddev     3
3     min     5
4     max    47

Upvotes: 1

Mark Wang
Mark Wang

Reputation: 2757

loc with callable.

(df.set_index('summary').T
   .loc[lambda x: (x['count'] > 5) & (x['mean'] > 4) & (x['min'] > 0)]
   .T.reset_index())

Upvotes: 2

Karthik V
Karthik V

Reputation: 1897

Set the summary columns as the index and then do this:

df.T.query("(count > 5) & (mean > 4) & (min > 0)").T

Upvotes: 0

BENY
BENY

Reputation: 323226

Here is one way

s=df.set_index('summary')
com=pd.Series([5,4,0],index=['count','mean','min'])
idx=s.loc[com.index].gt(com,axis=0).all().loc[lambda x : x].index
s[idx]
Out[142]: 
         col3
summary      
count      10
mean        5
stddev      3
min         5
max        47

Upvotes: 1

harpan
harpan

Reputation: 8631

You need:

df2 = df.set_index('summary').T
m1 = df2['count'] > 5
m2 = df2['mean'] > 4
m3 = df2['min'] > 0
df2.loc[m1 & m2 & m3].T.reset_index()

Output:

    summary col3
0   count   10
1   mean    5
2   stddev  3
3   min     5
4   max     47

Note: You can easily use the conditions directly in .loc[] , but when we have multiple conditions, it is best to use separate mask variables (m1, m2, m3)

Upvotes: 3

Related Questions