Reputation: 588
Assume, I have the following simple data frame.
d = {'Date': ['20/01/2019', '21/01/2019', '20/01/2019','21/01/2019', '21/01/2019' ,'20/01/2019','21/01/2019'],
'Customer': ['C1' ,'C2', 'C1', 'C3', 'C1', 'C2', 'C2'],
'Merchant':['M1', 'M1', 'M2', 'M1','M3', 'M1','M3']}
df = pd.DataFrame(data=d)
df
Date Customer Merchant
20/01/2019 C1 M1
21/01/2019 C2 M1
20/01/2019 C1 M2
21/01/2019 C3 M1
21/01/2019 C1 M3
20/01/2019 C2 M1
21/01/2019 C2 M3
I want to group by Date
considering customer to merchant combination.
For example, C1
(customer 1), appears 3 times in the transaction but with different date in three different merchants. I want to group occurrence of customer 1 based on date and put the combination in a new column Combination
. The same works for the other customers. Then I want count how many times a merchant (M1, M2, and M3
) listed by a customer. This can be done by creating columns Merchant_item
and their Frequency
. In addition, if a row is combined with other one, I want to delete the second repeated row as shown below.
Date Customer Merchant Combination Merchant_item Frequency
20/01/2019 C1 M1 {M1, M2} M1 3
21/01/2019 C2 M1 {M1, M3} M2 1
21/01/2019 C3 M1 {M1} M3 3
21/01/2019 C1 M3 {M3}
20/01/2019 C2 M3 {M3}
Can any one help on this in Python Pandas? Any help is appreciated!
Upvotes: 0
Views: 180
Reputation: 2488
This is how you do the initial grouping into a list and how you count the number of times merchants appear in a grouped list. As discussed in comments, I think the structure of your desired output is problematic so I have included the counts of each merchant that appears in in a Merchant tuple.
g = df.groupby(['Date','Customer']).aggregate(lambda x: tuple(x))
m = g.reset_index()['Merchant'].explode().value_counts()
count = g['Merchant'].apply(lambda x: m[m.index.isin(x)].values.tolist()).values
pd.concat([g.reset_index(), pd.Series(count, name='MerchantCount')], axis=1)
Date Customer Merchant MerchantCount
0 20/01/2019 C1 (M1, M2) [4, 1]
1 20/01/2019 C2 (M1,) [4]
2 21/01/2019 C1 (M3,) [2]
3 21/01/2019 C2 (M1, M3) [4, 2]
4 21/01/2019 C3 (M1,) [4]
Referring to your comment, the merchant count is in the variable, m
:
M1 4
M3 2
M2 1
Name: Merchant, dtype: int64
Upvotes: 1