Reputation: 331

How to count the frequency of a record based on condition in another column in pandas?

I have a table like this:

In [2]: df = pd.DataFrame({
   ...:     'donorID':[101,101,101,102,103,101,101,102,103],
   ...:     'recipientID':[11,11,21,21,31,11,21,31,31],
   ...:     'amount':[100,200,500,200,200,300,200,200,100],
   ...:     'year':[2014,2014,2014,2014,2014,2015,2015,2015,2015]
   ...: })

In [3]: df
Out[3]:
   amount  donorID  recipientID  year
0     100      101           11  2014
1     200      101           11  2014
2     500      101           21  2014
3     200      102           21  2014
4     200      103           31  2014
5     300      101           11  2015
6     200      101           21  2015
7     200      102           31  2015
8     100      103           31  2015

I'd like to count the number of donor-recipient pairs by donor (donations made by the same donor to the same recipient in n years, where n could be any number and it doesn't have to be consecutive, but I use 2 here to keep things simple). In this case, donor 101 donated to recipient 11 and 21 in 2014 as well as in 2015, the count for 101 is 2. The number for 102 is 0, and for 103 is 1. The result table would look like this:

   donorID  num_donation_2_years
0      101                     2
1      102                     0
2      103                     1

I've tried to use groupby and pivot_table but didn't manage to get the right answer. Any suggestion in pandas would be appreciated? Thanks!

Upvotes: 2

Answers (4)

Tai

Reputation: 7994

df_new = df.groupby(["donorID", "recipientID"])["year"].nunique().reset_index(name="year_count")
df_for_query = df_new.groupby(["donorID", "year_count"]).size().reset_index(name='numb_recipient')

    donorID year_count  numb_recipient
 0  101      2            2
 1  102      1            2
 2  103      2            1

The third column is how many patients that fit the year condition. The line 0 says donor 101 has 2 patients that he/she donates in exactly two years. This is not exactly your output, but you can query it easily from this df.

If you want to find that the number of patients a donor donates for some number of year, say 2, run

df_for_query.query("year_count == 2")

        donorID     year_count  numb_recipient
    0    101         2              2
    2    103         2              1

Thanks for Wen's inspiration for using nunique!

Upvotes: 1

rnso

Reputation: 24565

Following code works (explanation as comments) (ol for outlist):

# count frequency of donor-recipient combination
ol = pd.value_counts(df.apply(lambda x: str(x.donorID)+str(x.recipientID), axis=1)) 
ol = ol[ol>=2]                                  # choose only those >= 2
ol.index = list(map(lambda x: x[:3], ol.index)) # get donorID name again 
print(pd.value_counts(ol.index))                # print desired frequency

Output:

101    2
103    1
dtype: int64

Upvotes: 0

BENY

Reputation: 323306

Something like

df1=df.groupby('donorID').apply(lambda x : x.groupby(x.recipientID).year.nunique().gt(1).sum())
df1
Out[102]: 
donorID
101    2
102    0
103    1
dtype: int64

To get the dataframe

df1.to_frame('num_donation_2_years').reset_index()
Out[104]: 
   donorID  num_donation_2_years
0      101                     2
1      102                     0
2      103                     1

As Dark mention do not using apply

This is the update

df1=df.groupby(['donorID','recipientID']).year.nunique().gt(1).sum(level=0)
df1
Out[109]: 
donorID
101    2.0
102    0.0
103    1.0
Name: year, dtype: float64

df1.to_frame('num_donation_2_years').reset_index()
Out[104]: 
   donorID  num_donation_2_years
0      101                     2
1      102                     0
2      103                     1

Upvotes: 3

Bharath M Shetty

Reputation: 30605

An improvement to @Wen's solution, avoiding apply for more speed i.e

one = df.groupby(['donorID','recipientID'])['year'].nunique().gt(1)

two = one.groupby(level=0).sum().to_frame('no_of_donations_2_years').reset_index()

    donorID  no_of_donations_2_years
0      101                      2.0
1      102                      0.0
2      103                      1.0

Upvotes: 3

How to count the frequency of a record based on condition in another column in pandas?

Answers (4)

Related Questions