Hiwot
Hiwot

Reputation: 588

Grouping and computing frequency of items in Pandas Python

Assume, I have the following simple data frame.

d = {'Date': ['20/01/2019', '21/01/2019', '20/01/2019','21/01/2019', '21/01/2019' ,'20/01/2019','21/01/2019'], 
     'Customer': ['C1' ,'C2', 'C1', 'C3', 'C1', 'C2', 'C2'], 
     'Merchant':['M1', 'M1', 'M2', 'M1','M3', 'M1','M3']}
df = pd.DataFrame(data=d)
df
Date            Customer    Merchant
20/01/2019        C1        M1
21/01/2019        C2        M1
20/01/2019        C1        M2
21/01/2019        C3        M1
21/01/2019        C1        M3
20/01/2019        C2        M1
21/01/2019        C2        M3

I want to group by Date considering customer to merchant combination.

For example, C1 (customer 1), appears 3 times in the transaction but with different date in three different merchants. I want to group occurrence of customer 1 based on date and put the combination in a new column Combination. The same works for the other customers. Then I want count how many times a merchant (M1, M2, and M3) listed by a customer. This can be done by creating columns Merchant_item and their Frequency. In addition, if a row is combined with other one, I want to delete the second repeated row as shown below.

Date            Customer    Merchant   Combination   Merchant_item   Frequency 
20/01/2019        C1          M1         {M1, M2}       M1              3
21/01/2019        C2          M1         {M1, M3}       M2              1
21/01/2019        C3          M1         {M1}           M3              3
21/01/2019        C1          M3         {M3}
20/01/2019        C2          M3         {M3}

Can any one help on this in Python Pandas? Any help is appreciated!

Upvotes: 0

Views: 180

Answers (1)

forgetso
forgetso

Reputation: 2488

This is how you do the initial grouping into a list and how you count the number of times merchants appear in a grouped list. As discussed in comments, I think the structure of your desired output is problematic so I have included the counts of each merchant that appears in in a Merchant tuple.

g = df.groupby(['Date','Customer']).aggregate(lambda x: tuple(x))
m = g.reset_index()['Merchant'].explode().value_counts()

count = g['Merchant'].apply(lambda x: m[m.index.isin(x)].values.tolist()).values
pd.concat([g.reset_index(), pd.Series(count, name='MerchantCount')], axis=1)

Date    Customer    Merchant    MerchantCount
0   20/01/2019  C1  (M1, M2)    [4, 1]
1   20/01/2019  C2  (M1,)       [4]
2   21/01/2019  C1  (M3,)       [2]
3   21/01/2019  C2  (M1, M3)    [4, 2]
4   21/01/2019  C3  (M1,)       [4]

Referring to your comment, the merchant count is in the variable, m:

M1    4
M3    2
M2    1
Name: Merchant, dtype: int64

Upvotes: 1

Related Questions