Reputation: 51
I have a dataframe in r which contains information about clients purchasing history of the last year the data frame looks something like this:
Client | Prod A | Prod B | Prod C
---------------------------------
A | 1 | 0 | 1
B | 1 | 1 | 0
C | 1 | 0 | 1
D | 0 | 0 | 1
E | 1 | 0 | 0
---------------------------------
Where 1 means the client has purchased the product at some point and 0 it hasnt bought it at all.
In this particular table the most frequent combination is Product A and Product C with 2 cases out of 5. I want to find a method/function that will get me the most common combination of products for a data frame of this type of any dimensions. Thanks in advance for your help.
Upvotes: 0
Views: 227
Reputation: 2783
library(dplyr)
df <- data.frame(Client = c("A", "B", "C", "D", "E"),
`Prod A` = c(1, 1, 1, 0, 1),
`Prod B` = c(0, 1, 0, 0, 0),
`Prod C` = c(1, 0, 1, 1, 0))
df %>%
rowwise() %>%
mutate(length = sum(Prod.A, Prod.B, Prod.C)) %>%
group_by(Prod.A, Prod.B, Prod.C) %>%
mutate(count = n()) %>%
ungroup() %>%
filter(count == max(count) & length > 1) %>%
select(1:4)
which will produce:
Client Prod.A Prod.B Prod.C
<chr> <dbl> <dbl> <dbl>
1 A 1 0 1
2 C 1 0 1
Upvotes: 0
Reputation: 947
Solution using dplyr
library(dplyr)
df <- data.frame(Client = c("A","B","C","D","E"),
`Prod A` = c(1,1,1,0,1),
`Prod B` = c(0,1,0,0,0),
`Prod C` = c(1,0,1,1,0))
df %>%
dplyr::group_by(Prod.A,Prod.B,Prod.C) %>%
dplyr::summarise(count = n())
# A tibble: 4 x 4
# Groups: Prod.A, Prod.B [3]
Prod.A Prod.B Prod.C count
<dbl> <dbl> <dbl> <int>
1 0 0 1 1
2 1 0 0 1
3 1 0 1 2
4 1 1 0 1
Upvotes: 1
Reputation: 160742
res <- as.data.frame(xtabs(~., data=dat[,-1]))
res
# Prod.A Prod.B Prod.C Freq
# 1 0 0 0 0
# 2 1 0 0 1
# 3 0 1 0 0
# 4 1 1 0 1
# 5 0 0 1 1
# 6 1 0 1 2
# 7 0 1 1 0
# 8 1 1 1 0
From this you can see the counts of combinations, the "max" of which is
subset(res, Freq == max(Freq))
# Prod.A Prod.B Prod.C Freq
# 6 1 0 1 2
Upvotes: 1
Reputation: 10385
Your dataframe in df
aggregate(Client~Prod.A+Prod.B+Prod.C,df,length)
Prod.A Prod.B Prod.C Client
1 1 0 0 1
2 1 1 0 1
3 0 0 1 1
4 1 0 1 2
the last column Client giving the count
Upvotes: 1