Reputation: 25
My data looks like this:
id | date |
---|---|
1 | a |
1 | a |
1 | b |
1 | c |
1 | c |
1 | c |
2 | z |
2 | z |
2 | e |
2 | x |
I want to calculate the average of duplicates per id i.e for id=1 we have 2a 1b 3c I want the output to be 2. The result shoulbe like this:
id | mean |
---|---|
1 | 2 |
2 | 1.333 |
Upvotes: 1
Views: 446
Reputation: 458
A table
approach using the base
package:
at<-table(a$id,a$date)
apply(at,1,function(x) sum(x)/sum(x!=0))
# 1 2
#2.000000 1.333333
The dataset:
a = data.frame('id'=c(1,1,1,1,1,1,2,2,2,2),'date'=c('a','a','b','c','c','c','a','a','e','z'))
Upvotes: 1
Reputation: 887231
We can use
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(Mean = count(cur_data(), date) %>%
pull(n) %>%
mean)
Upvotes: 2
Reputation: 101753
Another data.table
option
> setDT(df)[, .(Mean = .N / uniqueN(date)), id]
id Mean
1: 1 2.000000
2: 2 1.333333
or
dcast(setDT(df), id ~ date, fill = NA)[, .(Mean = rowMeans(.SD, na.rm = TRUE)), id]
gives
id Mean
1: 1 2.000000
2: 2 1.333333
Upvotes: 2
Reputation: 389055
You can use mean(table(date))
to get average of counts, apply it by for each id
value.
Using dplyr -
library(dplyr)
df %>%
group_by(id) %>%
summarise(mean = mean(table(date)))
# id mean
# <int> <dbl>
#1 1 2
#2 2 1.33
Or with base R aggregate
.
aggregate(date~id, df, function(x) mean(table(x)))
Upvotes: 6
Reputation: 1241
using data.table
package
library(data.table)
# dt <- your_data_frame %>% as.data.table() ## convert to table from frame
dt[, .(N=.N), by = .(id,date)][, .(mean = mean(N)), by = id]
Upvotes: 3
Reputation: 1202
here is a package free solution
a = cbind(c(1,1,1,1,1,1,2,2,2,2),c('a','a','b','c','c','c','a','a','e','z'))
b = matrix(ncol = 2)[-1,]
for(i in unique(a[,1])){
b=rbind(b,c(i,sum(table(a[a[,1]==i,2]))/length(table(a[a[,1]==i,2]))))
}
The output:
[,1] [,2]
[1,] "1" "2"
[2,] "2" "1.33333333333333"
Upvotes: 0
Reputation: 17648
You can try a tidyverse
library(tidyverse)
d %>%
group_by(id) %>%
count(date) %>%
summarise(mean = mean(n))
# A tibble: 2 x 2
id mean
<int> <dbl>
1 1 2
2 2 1.33
Using base R
you can try
foo <- function(x) mean(rle(x)$length)
aggregate(d$date, by=list(d$id), foo)
The data
d <- read.table(text ="id date
1 a
1 a
1 b
1 c
1 c
1 c
2 a
2 a
2 e
2 z", header=T)
Upvotes: 4