German Romero Acevedo
German Romero Acevedo

Reputation: 51

How to find the mode across rows in R

I have a dataframe in r which contains information about clients purchasing history of the last year the data frame looks something like this:

Client | Prod A | Prod B | Prod C
---------------------------------
  A    |   1    |    0   |   1
  B    |   1    |    1   |   0
  C    |   1    |    0   |   1 
  D    |   0    |    0   |   1
  E    |   1    |    0   |   0
---------------------------------

Where 1 means the client has purchased the product at some point and 0 it hasnt bought it at all.

In this particular table the most frequent combination is Product A and Product C with 2 cases out of 5. I want to find a method/function that will get me the most common combination of products for a data frame of this type of any dimensions. Thanks in advance for your help.

Upvotes: 0

Views: 227

Answers (4)

koolmees
koolmees

Reputation: 2783

library(dplyr)

df <- data.frame(Client = c("A", "B", "C", "D", "E"), 
                 `Prod A` = c(1, 1, 1, 0, 1),
                 `Prod B` = c(0, 1, 0, 0, 0),
                 `Prod C` = c(1, 0, 1, 1, 0))

df %>%
  rowwise() %>%
  mutate(length = sum(Prod.A, Prod.B, Prod.C)) %>%
  group_by(Prod.A, Prod.B, Prod.C) %>%
  mutate(count = n()) %>%
  ungroup() %>%
  filter(count == max(count) & length > 1) %>%
  select(1:4)

which will produce:

  Client Prod.A Prod.B Prod.C
  <chr>   <dbl>  <dbl>  <dbl>
1 A           1      0      1
2 C           1      0      1

Upvotes: 0

tacoman
tacoman

Reputation: 947

Solution using dplyr

library(dplyr)
df <- data.frame(Client = c("A","B","C","D","E"),
                 `Prod A` = c(1,1,1,0,1),
                 `Prod B` = c(0,1,0,0,0),
                 `Prod C` = c(1,0,1,1,0))

df %>%
     dplyr::group_by(Prod.A,Prod.B,Prod.C) %>%
     dplyr::summarise(count = n())
# A tibble: 4 x 4
# Groups:   Prod.A, Prod.B [3]
  Prod.A Prod.B Prod.C count
   <dbl>  <dbl>  <dbl> <int>
1      0      0      1     1
2      1      0      0     1
3      1      0      1     2
4      1      1      0     1

Upvotes: 1

r2evans
r2evans

Reputation: 160742

res <- as.data.frame(xtabs(~., data=dat[,-1]))
res
#   Prod.A Prod.B Prod.C Freq
# 1      0      0      0    0
# 2      1      0      0    1
# 3      0      1      0    0
# 4      1      1      0    1
# 5      0      0      1    1
# 6      1      0      1    2
# 7      0      1      1    0
# 8      1      1      1    0

From this you can see the counts of combinations, the "max" of which is

subset(res, Freq == max(Freq))
#   Prod.A Prod.B Prod.C Freq
# 6      1      0      1    2

Upvotes: 1

user2974951
user2974951

Reputation: 10385

Your dataframe in df

aggregate(Client~Prod.A+Prod.B+Prod.C,df,length)

  Prod.A Prod.B Prod.C Client
1      1      0      0      1
2      1      1      0      1
3      0      0      1      1
4      1      0      1      2

the last column Client giving the count

Upvotes: 1

Related Questions