Learner_seeker
Learner_seeker

Reputation: 544

Association between products or cross sell %

I have a table like

ID     ProductBought
1       A
1       B
1       C
2       A
1       B
2       C
3       B
3       C
2       D
3       D
4       A
4       B 
4       C

I intend to calculate : Bought A , also bought B - 2 cases ( ID 1 and 4 ) , overall = 2/3 IDs ( 3 IDs bought A of which 2 bought B )

i know this would be related to association rules/apriori but i want overall aggregated numbers/calculations for all possible product combinations , below is illustration for the kind of table output m looking at :

Category  Total distinct customer( in LHS )     % cross sell
A to B        3                                     66% 
A to C        3                                     66 % 
B to C        3                                     100 %

Upvotes: 0

Views: 346

Answers (1)

zx8754
zx8754

Reputation: 56189

There must a better/cleaner way but here using dplyr:

library(dplyr)

df1 %>% 
  group_by(ProductBought) %>% 
  mutate(distinctCustomerN = n_distinct(ID)) %>% 
  ungroup() %>% 
  left_join(df1, by = "ID") %>% 
  filter(ProductBought.x != ProductBought.y) %>% 
  group_by(ProductBought.x, ProductBought.y, distinctCustomerN) %>% 
  summarise(n = n_distinct(ID)) %>% 
  mutate(n_pc = n/distinctCustomerN * 100)

#    ProductBought.x ProductBought.y distinctCustomerN     n      n_pc
#             <fctr>          <fctr>             <int> <int>     <dbl>
# 1                A               B                 3     2  66.66667
# 2                A               C                 3     3 100.00000
# 3                A               D                 3     1  33.33333
# 4                B               A                 3     2  66.66667
# 5                B               C                 3     3 100.00000
# 6                B               D                 3     1  33.33333
# 7                C               A                 4     3  75.00000
# 8                C               B                 4     3  75.00000
# 9                C               D                 4     2  50.00000
# 10               D               A                 2     1  50.00000
# 11               D               B                 2     1  50.00000
# 12               D               C                 2     2 100.00000

Upvotes: 1

Related Questions