Sunil Sequeira
Sunil Sequeira

Reputation: 11

How to group the data in R?

I have a dataset in R, which includes IDs and no. of occurrences:

ID    Occurrences
1001   A
1001   A
1001   B
1002   C
1002   A
1002   C

I would like to get the output as ID (unique) and occurrence (mode), like this:

ID     Occurrences
1001   A
1002   C

How can I do this in R? I have tried something like "table" but I am not getting a proper answer.

Upvotes: 1

Views: 76

Answers (3)

BENY
BENY

Reputation: 323226

Using base R aggregate

aggregate(df1,by=list(df1$ID),FUN=function(x) names(sort(-table(x)))[1] )[,names(df1)]
    ID Occurrences
1 1001           A
2 1002           C

Upvotes: 1

Ben373
Ben373

Reputation: 971

A base R answer without any fancy functions or packages

df[!duplicated(df$ID) & !duplicated(df$Occurrences),]
> ID Occurrences
1 1001           A
4 1002           C

Upvotes: 2

akrun
akrun

Reputation: 887118

After grouping by 'ID', get the 'mode' of the 'Occurrences'

library(dplyr)
df1 %>%
   group_by(ID) %>%
   summarise(Occurrences = Mode(Occurrences))
# A tibble: 2 x 2
#    ID Occurrences
#  <int> <chr>      
#1  1001 A          
#2  1002 C      

where Mode is

Mode <- function(x) {
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
 }

Or using base R

aggregate(Occurrences ~ ID, df1, FUN = Mode)

data

df1 <- structure(list(ID = c(1001L, 1001L, 1001L, 1002L, 1002L, 1002L
 ), Occurrences = c("A", "A", "B", "C", "A", "C")),
 class = "data.frame", row.names = c(NA, -6L))

Upvotes: 3

Related Questions