Reputation: 1
I have a dataframe with columns that contain values of a continuous variable (evapotranspiration) for each month. The column headers for my ET data are the first day of the month (e.g., 2010-01-01, 2010-02-01, etc.). I have another column that contains the year and month of a specific event (e.g., 2010-01) for each row, where each row is a different monitoring station. I want to get the average of the 2 months of my continuous variable (evapotranspiration) that precede the month of the specific event. Example below shows 'new_var' as my desired output.
StationId 2000-01-01 2000-02-01 2000-03-01 2000-04-01 fire_date new_var
1 0.1 0.2 0.3 0.4 2000-03 0.15
2 0.25 0.25 0.5 0.7 2000-04 0.375
Please help! I've not discovered how to index a mean() argument using the position of rows relative to a column header that contains values in a string unique to each row.
Upvotes: 0
Views: 57
Reputation: 389135
You can get data in long format, keep only month and year information from column names. For each StationId
select two previous rows of month that match date in fire_date
and take mean
of those values. Join the data to add that as a new column in the original data.
library(dplyr)
df %>%
tidyr::pivot_longer(cols = matches('\\d+-\\d+-\\d+')) %>%
mutate(name = substr(name, 1, 7)) %>%
group_by(StationId) %>%
slice({i <- which(fire_date == name);c(i-2, i-1)}) %>%
summarise(new_var = mean(value)) %>%
left_join(df, by = 'StationId')
# StationId new_var `2000-01-01` `2000-02-01` `2000-03-01` `2000-04-01` fire_date
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 1 0.15 0.1 0.2 0.3 0.4 2000-03
#2 2 0.375 0.25 0.25 0.5 0.7 2000-04
data
df <- structure(list(StationId = 1:2, `2000-01-01` = c(0.1, 0.25),
`2000-02-01` = c(0.2, 0.25), `2000-03-01` = c(0.3, 0.5),
`2000-04-01` = c(0.4, 0.7), fire_date = c("2000-03", "2000-04"
)), row.names = c(NA, -2L), class = "data.frame")
Upvotes: 1