Al. So
Al. So

Reputation: 53

For every unique value in column find the min 4 dates

I am trying to find the min 4 dates for every unique value in a column and take the average of those 4.

For example

ID      Date      Value
aa1    1/1/2006    5
aa1    2/1/2007    4
aa1    3/2/2005    7
aa1    1/20/1998   10
aa1    5/3/2001    9
aa1    7/4/2010    10
aa2n   4/3/2002    5
aa2n   4/5/2002    6
aa2n   6/30/2011   7
aa2n   6/21/2012   5
aa2n   1/5/1998    3

I would like to output to be

ID     Avg Value
aa1      7.75
aa2n     5.25

Upvotes: 2

Views: 54

Answers (2)

MKR
MKR

Reputation: 20095

You can use dplyr::top_n with n=-4 to select records for minimum 4 dates. The use dplyr::summarise to calculate mean for Value.

library(dplyr)
library(lubridate)

df %>% mutate(Date = mdy(Date)) %>%
  group_by(ID) %>%
  top_n(-4, Date) %>%    #Last 4 records, ordered by Date
  summarise(AvgVal = mean(Value))

# A tibble: 2 x 2
#  ID    AvgVal
#  <chr>  <dbl>
#1 aa1     7.75
#2 aa2n    5.25

Data:

df <- read.table(text = 
"ID      Date      Value
aa1    1/1/2006    5
aa1    2/1/2007    4
aa1    3/2/2005    7
aa1    1/20/1998   10
aa1    5/3/2001    9
aa1    7/4/2010    10
aa2n   4/3/2002    5
aa2n   4/5/2002    6
aa2n   6/30/2011   7
aa2n   6/21/2012   5
aa2n   1/5/1998    3",
stringsAsFactors = FALSE, header = TRUE)

Upvotes: 2

akrun
akrun

Reputation: 887138

We convert the 'Date' to Date class, arrange by 'Date' and get the mean of first 4 'Value' grouped by 'ID'

library(dplyr)
library(lubridate)
df1 %>% 
  group_by(ID) %>% 
  arrange(mdy(Date)) %>%
  summarise(Value = mean(head(Value, 4)))
# A tibble: 2 x 2
#  ID    Value
#   <chr> <dbl>
#1 aa1    7.75
#2 aa2n   5.25

Or we can do this with data.table

library(data.table)
setDT(df1)[order(mdy(Date)),.(Value = mean(head(Value, 4))), ID]
#     ID Value
#1: aa2n  5.25
#2:  aa1  7.75

Upvotes: 2

Related Questions