user3084006
user3084006

Reputation: 5564

Pandas counting and summing specific conditions

Are there single functions in pandas to perform the equivalents of SUMIF, which sums over a specific condition and COUNTIF, which counts values of specific conditions from Excel?

I know that there are many multiple step functions that can be used for

For example for sumif I can use (df.map(lambda x: condition) or df.size()) then use .sum(), and for countif, I can use (groupby functions and look for my answer or use a filter and the .count()).

Is there simple one step process to do these functions where you enter the condition and the dataframe and you get the sum or counted results?

Upvotes: 115

Views: 393360

Answers (4)

cottontail
cottontail

Reputation: 23081

For multiple conditions e.g. COUNTIFS/SUMIFS, a convenient method is query because it's very fast for large frames (where performance actually matters) and you don't need to worry about parentheses, bitwise-and etc. For example, to compute =SUMIFS(C2:C8, A2:A8,">1", B2:B8, "<3"), you can use

df.query("A>1 and B<3")['C'].sum()
# or 
df.iloc[:8].query("A>1 and B<3")['C'].sum()    # where the range is specified as in SUMIFS

For COUNTIFS, you can simply sum over the condition. For example, to compute =COUNTIFS(A2:A8,">0", B2:B8, "<3"), you can do:

countifs = ((df['A']>1) & (df['B']<3)).sum()

or just call query and compute the length of the result.

countifs = len(df.query("A>1 and B<3"))

You can also specify the range similar to how range is fed to COUNTIFS using iloc:

countifs = len(df.iloc[:8].query("A>1 and B<3"))

To perform row-wise COUNTIF/SUMIF, you can use axis=1 argument. Again, the range is given as a list of columns (['A', 'B']) similar to how range is fed to COUNTIF.

Also for COUNTIF (similar to the pandas equivalent of COUNTIFS), it suffices to sum over the condition while for SUMIF, we need to index the frame.

df['COUNTIF'] = (df[['A', 'B']] > 1).sum(axis=1)
df['SUMIF'] = df[df[['A', 'B']] > 1].sum(axis=1)
# equivalently, we can use `where` to make a filter as well
df['SUMIF'] = df.where(df[['A', 'B']] > 1, 0).sum(axis=1)

# can use `agg` to compute countif and sumif in one line.
df[['COUNTIF', 'SUMIF']] = df[df[['A', 'B']] > 1].agg(['count', 'sum'], axis=1)

res1

To perform column-wise COUNTIF/SUMIF, you can use axis=0 argument (which it is by default). The range here (the first 3 rows) is selected using iloc.

df.loc['COUNTIF'] = (df.iloc[:3] > 1).sum()
df.loc['SUMIF'] = df.where(df.iloc[:3] > 1, 0).sum()
# or
df.loc['SUMIF'] = df[df.iloc[:3] > 1].sum()

res2

For COUNTIF/SUMIF across multiple rows/columns, e.g. =COUNTIF(A2:B4, ">1"), call sum twice (once for the column-wise sum and then across columns-sums).

countif = (df.iloc[:4, :2]>1).sum().sum()    # the range is determined using iloc
sumif = df[df.iloc[:4, :2] > 1].sum().sum()  # first 4 rows and first 2 columns

Upvotes: 3

Jimmy C
Jimmy C

Reputation: 9670

You can first make a conditional selection, and sum up the results of the selection using the sum function.

>> df = pd.DataFrame({'a': [1, 2, 3]})
>> df[df.a > 1].sum()   
a    5
dtype: int64

Having more than one condition:

>> df[(df.a > 1) & (df.a < 3)].sum()
a    2
dtype: int64

If you want to do COUNTIF, just replace sum() with count()

Upvotes: 138

dan12345
dan12345

Reputation: 1614

I usually use numpy sum over the logical condition column:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'Age' : [20,24,18,5,78]})
>>> np.sum(df['Age'] > 20)
2

This seems to me slightly shorter than the solution presented above

Upvotes: 16

Thorsten Kranz
Thorsten Kranz

Reputation: 12755

You didn't mention the fancy indexing capabilities of dataframes, e.g.:

>>> df = pd.DataFrame({"class":[1,1,1,2,2], "value":[1,2,3,4,5]})
>>> df[df["class"]==1].sum()
class    3
value    6
dtype: int64
>>> df[df["class"]==1].sum()["value"]
6
>>> df[df["class"]==1].count()["value"]
3

You could replace df["class"]==1by another condition.

Upvotes: 57

Related Questions