Reputation: 1197
What are some pandas approaches to being able to count rows where multiple conditions are met?
For example:
df = pd.DataFrame({ 'A' : ["1","2","3","4"],
'B' : pd.Timestamp('20130102'),
'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
'D' : np.array([3] * 4,dtype='int32'),
'E' : pd.Categorical(["test","train","test","train"]),
'F' : 'foo' })
df
I am demonstrating the below as a way to count a single condition:
print ("Sum for 1 and 3:",(df['A']=="1").sum(),"records")
What are some ways to count both "1" and "3"?
In the above example, I would expect an output of Sum for 1 and 3: 2 records
Upvotes: 1
Views: 48
Reputation: 18638
in this case you can use in1d
, which check appartenance :
np.in1d(df["A"],["1","3"]).sum()
This is very fast.
Upvotes: 2
Reputation: 862761
You can use:
print ("Sum for 1 and 3:",((df['A']=="1") | (df['A']=="3")).sum(),"records")
('Sum for 1 and 3:', 2, 'records')
Or use str.contains
with |
(or
):
print ("Sum for 1 and 3:",(df['A'].str.contains("1|3")).sum(),"records")
('Sum for 1 and 3:', 2, 'records')
Faster approach use np.sum
:
print ("Sum for 1 and 3:",np.sum(df['A'].str.contains("1|3")),"records")
('Sum for 1 and 3:', 2, 'records')
Upvotes: 1