Reputation: 53
I am trying to find the min 4 dates for every unique value in a column and take the average of those 4.
For example
ID Date Value
aa1 1/1/2006 5
aa1 2/1/2007 4
aa1 3/2/2005 7
aa1 1/20/1998 10
aa1 5/3/2001 9
aa1 7/4/2010 10
aa2n 4/3/2002 5
aa2n 4/5/2002 6
aa2n 6/30/2011 7
aa2n 6/21/2012 5
aa2n 1/5/1998 3
I would like to output to be
ID Avg Value
aa1 7.75
aa2n 5.25
Upvotes: 2
Views: 54
Reputation: 20095
You can use dplyr::top_n
with n=-4
to select records for minimum 4 dates. The use dplyr::summarise
to calculate mean
for Value
.
library(dplyr)
library(lubridate)
df %>% mutate(Date = mdy(Date)) %>%
group_by(ID) %>%
top_n(-4, Date) %>% #Last 4 records, ordered by Date
summarise(AvgVal = mean(Value))
# A tibble: 2 x 2
# ID AvgVal
# <chr> <dbl>
#1 aa1 7.75
#2 aa2n 5.25
Data:
df <- read.table(text =
"ID Date Value
aa1 1/1/2006 5
aa1 2/1/2007 4
aa1 3/2/2005 7
aa1 1/20/1998 10
aa1 5/3/2001 9
aa1 7/4/2010 10
aa2n 4/3/2002 5
aa2n 4/5/2002 6
aa2n 6/30/2011 7
aa2n 6/21/2012 5
aa2n 1/5/1998 3",
stringsAsFactors = FALSE, header = TRUE)
Upvotes: 2
Reputation: 887138
We convert the 'Date' to Date
class, arrange
by 'Date' and get the mean
of first 4 'Value' grouped by 'ID'
library(dplyr)
library(lubridate)
df1 %>%
group_by(ID) %>%
arrange(mdy(Date)) %>%
summarise(Value = mean(head(Value, 4)))
# A tibble: 2 x 2
# ID Value
# <chr> <dbl>
#1 aa1 7.75
#2 aa2n 5.25
Or we can do this with data.table
library(data.table)
setDT(df1)[order(mdy(Date)),.(Value = mean(head(Value, 4))), ID]
# ID Value
#1: aa2n 5.25
#2: aa1 7.75
Upvotes: 2