Reputation: 2611
I am trying to build a contingency table in python using pandas. Here is my data looks like in pandas dataframe
InvoiceNo Item Quantity
123 a 1
123 b 2
123 c 1
124 a 1
124 d 3
125 c 1
125 b 2
So, I need to build a table where I can easily pick what are the items bought together like below
Item Bought Together:
a b c d
a 2 1 1 1
b 1 2 2 0
c 1 2 2 0
d 1 0 0 1
Here, the diagonal elements represent the frequency of the item across all the invoices.
How can I build this structure efficiently?
Upvotes: 1
Views: 1111
Reputation: 863056
Use DataFrame.merge
with cross join with crosstab
and for cleaning index and columns names DataFrame.rename_axis
:
df = df.merge(df, on='InvoiceNo')
df = pd.crosstab(df['Item_x'], df['Item_y']).rename_axis(None).rename_axis(None, axis=1)
print (df)
a b c d
a 2 1 1 1
b 1 2 2 0
c 1 2 2 0
d 1 0 0 1
Upvotes: 1