Hatem
Hatem

Reputation: 25

Counting the average of duplicates per id in R

My data looks like this:

id date
1 a
1 a
1 b
1 c
1 c
1 c
2 z
2 z
2 e
2 x

I want to calculate the average of duplicates per id i.e for id=1 we have 2a 1b 3c I want the output to be 2. The result shoulbe like this:

id mean
1 2
2 1.333

Upvotes: 1

Views: 446

Answers (7)

DeBARtha
DeBARtha

Reputation: 458

A table approach using the base package:

at<-table(a$id,a$date)
apply(at,1,function(x) sum(x)/sum(x!=0))

#       1        2 
#2.000000 1.333333

The dataset:

a = data.frame('id'=c(1,1,1,1,1,1,2,2,2,2),'date'=c('a','a','b','c','c','c','a','a','e','z'))

Upvotes: 1

akrun
akrun

Reputation: 887231

We can use

library(dplyr)
df1 %>%
     group_by(id) %>%
     summarise(Mean = count(cur_data(), date) %>% 
              pull(n) %>%
               mean)

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101753

Another data.table option

> setDT(df)[, .(Mean = .N / uniqueN(date)), id]
   id     Mean
1:  1 2.000000
2:  2 1.333333

or

dcast(setDT(df), id ~ date, fill = NA)[, .(Mean = rowMeans(.SD, na.rm = TRUE)), id]

gives

   id     Mean
1:  1 2.000000
2:  2 1.333333

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389055

You can use mean(table(date)) to get average of counts, apply it by for each id value.

Using dplyr -

library(dplyr)

df %>%
  group_by(id) %>%
  summarise(mean = mean(table(date)))

#     id  mean
#  <int> <dbl>
#1     1  2   
#2     2  1.33

Or with base R aggregate.

aggregate(date~id, df, function(x) mean(table(x)))

Upvotes: 6

dy_by
dy_by

Reputation: 1241

using data.table package

library(data.table)
# dt <- your_data_frame %>% as.data.table()  ## convert to table from frame
dt[, .(N=.N), by = .(id,date)][, .(mean = mean(N)), by = id]

Upvotes: 3

elielink
elielink

Reputation: 1202

here is a package free solution

a = cbind(c(1,1,1,1,1,1,2,2,2,2),c('a','a','b','c','c','c','a','a','e','z'))
b = matrix(ncol = 2)[-1,]
for(i in unique(a[,1])){
  
  b=rbind(b,c(i,sum(table(a[a[,1]==i,2]))/length(table(a[a[,1]==i,2]))))
}

The output:

    [,1] [,2]              
[1,] "1"  "2"               
[2,] "2"  "1.33333333333333"

Upvotes: 0

Roman
Roman

Reputation: 17648

You can try a tidyverse

library(tidyverse)
d %>% 
  group_by(id) %>% 
  count(date) %>% 
  summarise(mean = mean(n))
# A tibble: 2 x 2
     id  mean
  <int> <dbl>
1     1  2   
2     2  1.33

Using base R you can try

foo <- function(x) mean(rle(x)$length)
aggregate(d$date, by=list(d$id), foo)

The data

d <- read.table(text ="id   date
1     a
1     a
1     b
1     c
1     c
1     c
2     a
2     a
2     e
2     z", header=T)

Upvotes: 4

Related Questions