Dani.Elizabeth
Dani.Elizabeth

Reputation: 17

Calculate means beyond a given date R

I have a large dataframe (Merged) of several thousand lines and 20 columns. Each line is a record of a site visit. There are 400 sites in the dataframe and each has multiple lines for visits on different months. . Example:

Site Date Method Result ChangeDate
2200 01/07/2021 24 0.22 01/07/2021
2201 03/07/2021 21 0.01 01/01/2021
2203 03/07/2021 21 0.02 01/01/2021
2200 01/08/2021 21 0.05 01/07/2021
2201 03/08/2021 21 0.01 01/01/2021
2203 03/08/2021 21 0.02 01/01/2021
2200 01/09/2021 21 0.05 01/07/2021

And so on for all 400 sites over two years.

I need to calculate the mean result for each site, which I can do:

OPmean <- aggregate(Merged$Result, list(Merged$Site), FUN=mean, na.rm = TRUE

But I only want to take the means after the change date. So on site 2200 I would want to take the mean on these lines only:

Site Date Method Result ChangeDate
2200 01/08/2021 21 0.05 01/07/2021
2200 01/09/2021 21 0.05 01/07/2021

Leaving the first line for 2200 off because the date occurs on (or before) the change date.

For an output I would want a new dataframe something like this:

Site Mean Result
2200 0.05
2201 0.01
2203 0.02

Upvotes: 1

Views: 35

Answers (2)

Michael Roswell
Michael Roswell

Reputation: 1482

@user2974951 gives a great base R answer. A tidyverse approach might be

library(dplyr) 
Merged %>% 
     group_by(Site) %>% 
     filter(Date > ChangeDate) %>% 
     summarize(mr = mean(Result) # or optionally, mean(Result, na.rm = TRUE)

Upvotes: 1

user2974951
user2974951

Reputation: 10385

Small change to your code

aggregate(
  Result~Site,
  data=df[df$Date>df$ChangeDate,],
  FUN=mean
)

  Site Result
1 2200   0.05
2 2201   0.01
3 2203   0.02

Upvotes: 2

Related Questions