Abhinav Sharma
Abhinav Sharma

Reputation: 45

How to get the values with maximum occurrence across range of columns based on another column

I have a data frame with 29 rows and 26 column with a lot of NA's. Data looks somewhat like shown below( working on R studio)

df <-

V1   V2   V3   V4   V5   V6   V7   V8   V9   V10

a1   b1   d    f    d    d    na   na   na    f

a1   b2   d    d    d    f    na   f    na   na

a1   b3   d    f    d    f    na   na   d    d

a2   c1   f    f    d    na   na   d    d    f

a2   c2   f    d    d    f    na   f    na   na

a2   c3   d    f    d    f    na   na   f    d

Here we have V1-V10 columns. a1 and a2 are 2 distinct values in column V1,

b1-b3 in column V2 are distinct values related to a1 in V1 and c1-c3 related to a2.

column V3- V10 we have distinct values in each rows related to a1 and a2

Result i want is as below-

NewV1      max.occurrence(V3-V10)

 a1            d

 a2            f

to summarize i want to get the value with maximum occurrence(max.occurrence(V3-V10)) across column V3-V10 based on V1. NOTE= NA to be excluded.

Upvotes: 0

Views: 890

Answers (2)

C. Braun
C. Braun

Reputation: 5191

If you like dplyr, this would work:

df %>% 
    gather("key", "value", V3:V10) %>%
    group_by(V1) %>% 
    dplyr::summarise(max.occurence = names(which.max(table(value))))

This gives:

# A tibble: 2 x 2
  V1    max.occurence
  <fct> <chr>        
1 a1    d            
2 a2    f 

Upvotes: 0

Jaap
Jaap

Reputation: 83215

Another possiblity using the data.table-package:

library(data.table)

melt(setDT(df),
     id = 1:2,
     na.rm = TRUE)[, .N, by = .(V1, value)
                   ][order(-N), .(max.occ = value[1]), by = V1]

which gives:

   V1 max.occ
1: a1       d
2: a2       f

A similar logic with the tidyverse-packages:

library(dplyr)
library(tidyr)

df %>% 
  gather(k, v, V3:V10, na.rm = TRUE) %>% 
  group_by(V1, v) %>% 
  tally() %>% 
  arrange(-n) %>% 
  slice(1) %>% 
  select(V1, max.occ = v)

Upvotes: 1

Related Questions