Transforming a data frame column into a counter matrix

Question

I have a data frame like this. Invoices are transformed into a table where each row has a single product. This table consists of millions of rows -

invoice	products
INVC1	xx
INVC1	yy
INVC1	zz
INVC1	aa
INVC2	xx
INVC2	aa
INVC2	bb
INVC2	cc

Now I want to create a table like the one below where I can see how many times each product is purchased with other products-

	xx	yy	zz	aa	bb	cc
xx	0	1	1	2	1	1
yy	1	0	1	1	0	0
zz	1	1	0	1	0	0
aa	2	1	1	0	1	1
bb	1	0	0	1	0	1
cc	1	0	0	1	1	0

Is there any numpy/pandas way to create a table like this? Or, is there any optimized way to do this.

BENY · Accepted Answer

Let us use pd.crosstab then dot

s = pd.crosstab(df['invoice'],df['products'])
out = s.T.dot(s)
Out[196]: 
products  aa  bb  cc  xx  yy  zz
products                        
aa         2   1   1   2   1   1
bb         1   1   1   1   0   0
cc         1   1   1   1   0   0
xx         2   1   1   2   1   1
yy         1   0   0   1   1   1
zz         1   0   0   1   1   1

Transforming a data frame column into a counter matrix

Answers (1)

Related Questions