shopping basket analysis in python with pandas

Question

I have a dataframe containing transaction data. Each row represents one transaction and the columns indicate whether a product has been bought from a category (categories are A-F) or not (one = yes, zero = no). Now I would like to compute the pairs of transactions within each category. My dataframe looks as follows:

A  B  C  D  E  F  
1  1  0  0  0  0   
1  0  1  1  0  0

The output should be a matrix counting each pairs of the categories in the dataframe like so:

  A B C D E F
A 4 2 1 0 4 2
B 5 6 7 3 5 1
C 1 6 5 8 7 9
D ...
E ...
F ...

Anyone knows a solution on how to solve this?

Thank you very much!

user2285236 · Accepted Answer

Use the dot product with its transpose:

df.T.dot(df)
Out: 
   A  B  C  D  E  F
A  2  1  1  1  0  0
B  1  1  0  0  0  0
C  1  0  1  1  0  0
D  1  0  1  1  0  0
E  0  0  0  0  0  0
F  0  0  0  0  0  0

Note that looking for pairwise occurrences is not scalable though. You might want to look at apriori algorithm.

shopping basket analysis in python with pandas

Answers (1)

Related Questions