Natalie Collar
Natalie Collar

Reputation: 1

Get average of rows based on their position to a column header that matches a string

I have a dataframe with columns that contain values of a continuous variable (evapotranspiration) for each month. The column headers for my ET data are the first day of the month (e.g., 2010-01-01, 2010-02-01, etc.). I have another column that contains the year and month of a specific event (e.g., 2010-01) for each row, where each row is a different monitoring station. I want to get the average of the 2 months of my continuous variable (evapotranspiration) that precede the month of the specific event. Example below shows 'new_var' as my desired output.

StationId 2000-01-01  2000-02-01  2000-03-01  2000-04-01  fire_date  new_var
1         0.1         0.2         0.3         0.4         2000-03    0.15
2         0.25        0.25        0.5         0.7         2000-04    0.375

Please help! I've not discovered how to index a mean() argument using the position of rows relative to a column header that contains values in a string unique to each row.

Upvotes: 0

Views: 57

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

You can get data in long format, keep only month and year information from column names. For each StationId select two previous rows of month that match date in fire_date and take mean of those values. Join the data to add that as a new column in the original data.

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = matches('\\d+-\\d+-\\d+')) %>%
  mutate(name = substr(name, 1, 7)) %>%
  group_by(StationId) %>%
  slice({i <- which(fire_date == name);c(i-2, i-1)}) %>%
  summarise(new_var = mean(value)) %>%
  left_join(df, by = 'StationId')

# StationId new_var `2000-01-01` `2000-02-01` `2000-03-01` `2000-04-01` fire_date
#      <int>   <dbl>        <dbl>        <dbl>        <dbl>        <dbl> <chr>    
#1         1   0.15          0.1          0.2           0.3          0.4 2000-03  
#2         2   0.375         0.25         0.25          0.5          0.7 2000-04  

data

df <- structure(list(StationId = 1:2, `2000-01-01` = c(0.1, 0.25), 
    `2000-02-01` = c(0.2, 0.25), `2000-03-01` = c(0.3, 0.5), 
    `2000-04-01` = c(0.4, 0.7), fire_date = c("2000-03", "2000-04"
    )), row.names = c(NA, -2L), class = "data.frame")

Upvotes: 1

Related Questions