Reputation: 111
I have an order data set in the following format:
Ordernumber; Category; # Sold Items
123; A; 3
123; B; 4
234; B; 2
234; C; 1
234; D; 5
...
So, every order has as many lines as there were different categories in the order.
Now, I want to count for every category pair how often they were ordered together in one order.
In the end I would like to have a "correlation" matrix like this
A B C D
A 1
B 1 1 1
C 1 1
D 1 1
Has anyone a good (simple) idea?
Thank you so much!
Upvotes: 1
Views: 118
Reputation: 25914
Perhaps using matrix multiplication gets you there:
dat <- read.table(header=T, text="Ordernumber; Category; Sold Items
123; A; 3
123; B; 4
234; B; 2
234; C; 1
234; D; 5", sep=";")
tt <- table(dat[1:2])
crossprod(tt) # t(tt) %*% tt
# Category
#Category A B C D
# A 1 1 0 0
# B 1 2 1 1
# C 0 1 1 1
# D 0 1 1 1
This has the diagonal but can easily be removed with diag
Upvotes: 1