Jairo Martínez
Jairo Martínez

Reputation: 525

Apply two filter conditions on python pandas dataframe groupby

I have a Pandas DataFrame as follow:

In [28]: df = pd.DataFrame({'A':['CA', 'FO', 'CAP', 'CP'],
                            'B':['Name1', 'Name2', 'Name3', 'Name4'],
                            'C':['One', 'Two', 'Other', 'Some']})

In [29]: df
Out[29]:
    A      B      C
0   CA  Name1    One
1   FO  Name2    Two
2  CAP  Name3  Other
3   CP  Name4   Some

I am trying to count all records in column A with values of 'CA' and 'CP', to do this I am executing the next:

In [30]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA'))
Out[30]: 1

There is a way to get both information in a single sentence?, because if I try to do something like this:

In [32]: len(df.groupby('A').filter(lambda x: x['A'] == 'CA' or
   ....:                                      x['A'] == 'CP'))

I am getting this error:

ValueError                                Traceback (most recent call last)
<ipython-input-32-111c3fde30f2> in <module>()
----> 1 len(df.groupby('A').filter(lambda x: x['A'] == 'CA') or
      2                                      x['A'] == 'CP')

c:\python27\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    885         raise ValueError("The truth value of a {0} is ambiguous. "
    886                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 887                          .format(self.__class__.__name__))
    888
    889     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 3

Views: 769

Answers (2)

EdChum
EdChum

Reputation: 394051

Use isin and pass a list to filter the df prior to getting the size:

In [4]:
len(df[df['A'].isin(['CA','CP'])])

Out[4]:
2

Upvotes: 1

Fabio Lamanna
Fabio Lamanna

Reputation: 21552

I think you don't need to groupby, just use a mask and the "or" operator (which is | in pandas):

In [3]: df
Out[3]: 
     A      B      C
0   CA  Name1    One
1   FO  Name2    Two
2  CAP  Name3  Other
3   CP  Name4   Some

In [4]: c = df[(df['A']=='CA') | (df['A']=='CP')]

In [5]: c
Out[5]: 
    A      B     C
0  CA  Name1   One
3  CP  Name4  Some

In [6]: len(c)
Out[6]: 2

Upvotes: 1

Related Questions