Reputation: 45
I have a data frame with 29 rows and 26 column with a lot of NA's. Data looks somewhat like shown below( working on R studio)
df <-
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
a1 b1 d f d d na na na f
a1 b2 d d d f na f na na
a1 b3 d f d f na na d d
a2 c1 f f d na na d d f
a2 c2 f d d f na f na na
a2 c3 d f d f na na f d
Here we have V1-V10 columns. a1 and a2 are 2 distinct values in column V1,
b1-b3 in column V2 are distinct values related to a1 in V1 and c1-c3 related to a2.
column V3- V10 we have distinct values in each rows related to a1 and a2
Result i want is as below-
NewV1 max.occurrence(V3-V10)
a1 d
a2 f
to summarize i want to get the value with maximum occurrence(max.occurrence(V3-V10)) across column V3-V10 based on V1. NOTE= NA to be excluded.
Upvotes: 0
Views: 890
Reputation: 5191
If you like dplyr
, this would work:
df %>%
gather("key", "value", V3:V10) %>%
group_by(V1) %>%
dplyr::summarise(max.occurence = names(which.max(table(value))))
This gives:
# A tibble: 2 x 2
V1 max.occurence
<fct> <chr>
1 a1 d
2 a2 f
Upvotes: 0
Reputation: 83215
Another possiblity using the data.table
-package:
library(data.table)
melt(setDT(df),
id = 1:2,
na.rm = TRUE)[, .N, by = .(V1, value)
][order(-N), .(max.occ = value[1]), by = V1]
which gives:
V1 max.occ
1: a1 d
2: a2 f
A similar logic with the tidyverse
-packages:
library(dplyr)
library(tidyr)
df %>%
gather(k, v, V3:V10, na.rm = TRUE) %>%
group_by(V1, v) %>%
tally() %>%
arrange(-n) %>%
slice(1) %>%
select(V1, max.occ = v)
Upvotes: 1