pd farhad
pd farhad

Reputation: 6432

Transforming a data frame column into a counter matrix

I have a data frame like this. Invoices are transformed into a table where each row has a single product. This table consists of millions of rows -

invoice products
INVC1 xx
INVC1 yy
INVC1 zz
INVC1 aa
INVC2 xx
INVC2 aa
INVC2 bb
INVC2 cc

Now I want to create a table like the one below where I can see how many times each product is purchased with other products-

xx yy zz aa bb cc
xx 0 1 1 2 1 1
yy 1 0 1 1 0 0
zz 1 1 0 1 0 0
aa 2 1 1 0 1 1
bb 1 0 0 1 0 1
cc 1 0 0 1 1 0

Is there any numpy/pandas way to create a table like this? Or, is there any optimized way to do this.

Upvotes: 1

Views: 66

Answers (1)

BENY
BENY

Reputation: 323316

Let us use pd.crosstab then dot

s = pd.crosstab(df['invoice'],df['products'])
out = s.T.dot(s)
Out[196]: 
products  aa  bb  cc  xx  yy  zz
products                        
aa         2   1   1   2   1   1
bb         1   1   1   1   0   0
cc         1   1   1   1   0   0
xx         2   1   1   2   1   1
yy         1   0   0   1   1   1
zz         1   0   0   1   1   1

Upvotes: 6

Related Questions