eshfaq ahmad
eshfaq ahmad

Reputation: 123

Logical AND of multiple columns in pandas

I have a dataframe(edata) as given below

Domestic   Catsize    Type   Count
   1          0         1      1
   1          1         1      8
   1          0         2      11
   0          1         3      14
   1          1         4      21
   0          1         4      31

From this dataframe I want to calculate the sum of all counts where the logical AND of both variables (Domestic and Catsize) results in Zero (0) such that

1   0    0
0   1    0
0   0    0

The code I use to perform the process is

g=edata.groupby('Type')
q3=g.apply(lambda x:x[((x['Domestic']==0) & (x['Catsize']==0) |
                       (x['Domestic']==0) & (x['Catsize']==1) |
                       (x['Domestic']==1) & (x['Catsize']==0)
                       )]
            ['Count'].sum()
           )

q3

Type
1     1
2    11
3    14
4    31

This code works fine, however, if the number of variables in the dataframe increases then the number of conditions grows rapidly. So, is there a smart way to write a condition that states that if the ANDing the two (or more) variables result in a zero then perform the sum() function

Upvotes: 7

Views: 5581

Answers (3)

qbit
qbit

Reputation: 45

how about

columns = ['Domestic', 'Catsize']
df.loc[~df[columns].prod(axis=1).astype(bool), 'Count']

and then do with it whatever you want.

for logical AND the product does the trick nicely. for logcal OR you can use sum(axis=1) with proper negation in advance.

Upvotes: 0

cs95
cs95

Reputation: 402593

Use np.logical_and.reduce to generalise.

columns = ['Domestic', 'Catsize']
df[~np.logical_and.reduce(df[columns], axis=1)].groupby('Type')['Count'].sum()

Type
1     1
2    11
3    14
4    31
Name: Count, dtype: int64

Before adding it back, use map to broadcast:

u = df[~np.logical_and.reduce(df[columns], axis=1)].groupby('Type')['Count'].sum()
df['NewCol'] = df.Type.map(u)

df
   Domestic  Catsize  Type  Count  NewCol
0         1        0     1      1       1
1         1        1     1      8       1
2         1        0     2     11      11
3         0        1     3     14      14
4         1        1     4     21      31
5         0        1     4     31      31

Upvotes: 5

jpp
jpp

Reputation: 164693

You can filter first using pd.DataFrame.all negated:

cols = ['Domestic', 'Catsize']
res = df[~df[cols].all(1)].groupby('Type')['Count'].sum()

print(res)
# Type
# 1     1
# 2    11
# 3    14
# 4    31
# Name: Count, dtype: int64

Upvotes: 5

Related Questions