Reputation: 4050
I have a dataframe containing transaction data. Each row represents one transaction and the columns indicate whether a product has been bought from a category (categories are A-F) or not (one = yes, zero = no). Now I would like to compute the pairs of transactions within each category. My dataframe looks as follows:
A B C D E F
1 1 0 0 0 0
1 0 1 1 0 0
The output should be a matrix counting each pairs of the categories in the dataframe like so:
A B C D E F
A 4 2 1 0 4 2
B 5 6 7 3 5 1
C 1 6 5 8 7 9
D ...
E ...
F ...
Anyone knows a solution on how to solve this?
Thank you very much!
Upvotes: 0
Views: 501
Reputation:
Use the dot product with its transpose:
df.T.dot(df)
Out:
A B C D E F
A 2 1 1 1 0 0
B 1 1 0 0 0 0
C 1 0 1 1 0 0
D 1 0 1 1 0 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
Note that looking for pairwise occurrences is not scalable though. You might want to look at apriori algorithm.
Upvotes: 3