Peter
Peter

Reputation: 383

pandas 0.20: df with columns of multi-level indexes - How do I filter with condition on multiple columns?

I want to find all rows where all 3 columns are >0. How do I do so? Thanks! I know that using loc with IndexSlicer can return a column of True/False.

But it doesn't work with condition for multiple columns, or return a table of values.

Importance| A         |  B   | C       |
Category | Cat1       | Cat2 | Cat1    |
         |Total Assets| AUMs | Revenue |
Firm 1   | 100        | 300  | 300     |
Firm 2   | 200        | 3400 | 200     |
Firm 3   | 300        | 800  | 400     |
Firm 4   | NaN        | 800  | 350     |

idx=pd.IndexSlice
df.sort_index(ascending=True, inplace=True, axis=1)
df.loc[:,idx[:,'Cat1','Total Assets']]>0

Importance| A         |  
Category | Cat1       | 
         |Total Assets| 
Firm 1   | T       | 
Firm 2   | T       | 
Firm 3   | T       |
Firm 4   | F       |

Desired Output:

Importance| A         |  B   | C       |
Category | Cat1       | Cat2 | Cat1    |
         |Total Assets| AUMs | Revenue |
Firm 1   | 100        | 300  | 300     |
Firm 2   | 200        | 3400 | 200     |
Firm 3   | 300        | 800  | 400     |

Upvotes: 0

Views: 50

Answers (1)

Corralien
Corralien

Reputation: 120429

IIUC:

>>> df[df.iloc[:, 1:].gt(0).all(axis=1)]

  Importance           A     B       C
    Category        Cat1  Cat2    Cat1
             TotalAssets  AUMs Revenue
0      Firm1       100.0   300     300
1      Firm2       200.0  3400     200
2      Firm3       300.0   800     400

Update

I only want the filtering for col 'TotalAssets'>0 & 'Revenue'>0?

# idx = pd.IndexSlice
>>> df[df.loc[:, idx[:, :, ['TotalAssets', 'Revenue']]].gt(0).all(axis=1)]

  Importance           A     B       C
    Category        Cat1  Cat2    Cat1
             TotalAssets  AUMs Revenue
0      Firm1       100.0   300     300
1      Firm2       200.0  3400     200
2      Firm3       300.0   800     400

Upvotes: 2

Related Questions