Reputation: 544
I have a table like
ID ProductBought
1 A
1 B
1 C
2 A
1 B
2 C
3 B
3 C
2 D
3 D
4 A
4 B
4 C
I intend to calculate : Bought A , also bought B - 2 cases ( ID 1 and 4 ) , overall = 2/3 IDs ( 3 IDs bought A of which 2 bought B )
i know this would be related to association rules/apriori but i want overall aggregated numbers/calculations for all possible product combinations , below is illustration for the kind of table output m looking at :
Category Total distinct customer( in LHS ) % cross sell
A to B 3 66%
A to C 3 66 %
B to C 3 100 %
Upvotes: 0
Views: 346
Reputation: 56189
There must a better/cleaner way but here using dplyr:
library(dplyr)
df1 %>%
group_by(ProductBought) %>%
mutate(distinctCustomerN = n_distinct(ID)) %>%
ungroup() %>%
left_join(df1, by = "ID") %>%
filter(ProductBought.x != ProductBought.y) %>%
group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>%
summarise(n = n_distinct(ID)) %>%
mutate(n_pc = n/distinctCustomerN * 100)
# ProductBought.x ProductBought.y distinctCustomerN n n_pc
# <fctr> <fctr> <int> <int> <dbl>
# 1 A B 3 2 66.66667
# 2 A C 3 3 100.00000
# 3 A D 3 1 33.33333
# 4 B A 3 2 66.66667
# 5 B C 3 3 100.00000
# 6 B D 3 1 33.33333
# 7 C A 4 3 75.00000
# 8 C B 4 3 75.00000
# 9 C D 4 2 50.00000
# 10 D A 2 1 50.00000
# 11 D B 2 1 50.00000
# 12 D C 2 2 100.00000
Upvotes: 1