Jack Daniel
Jack Daniel

Reputation: 2611

Build Contingency Table in Python

I am trying to build a contingency table in python using pandas. Here is my data looks like in pandas dataframe

InvoiceNo Item Quantity
123        a     1
123        b     2
123        c     1
124        a     1
124        d     3
125        c     1
125        b     2

So, I need to build a table where I can easily pick what are the items bought together like below

Item Bought Together:

   a  b  c  d
a  2  1  1  1
b  1  2  2  0
c  1  2  2  0
d  1  0  0  1

Here, the diagonal elements represent the frequency of the item across all the invoices.

How can I build this structure efficiently?

Upvotes: 1

Views: 1111

Answers (1)

jezrael
jezrael

Reputation: 863056

Use DataFrame.merge with cross join with crosstab and for cleaning index and columns names DataFrame.rename_axis:

df = df.merge(df, on='InvoiceNo')
df = pd.crosstab(df['Item_x'], df['Item_y']).rename_axis(None).rename_axis(None, axis=1)
print (df)
   a  b  c  d
a  2  1  1  1
b  1  2  2  0
c  1  2  2  0
d  1  0  0  1

Upvotes: 1

Related Questions