Manuel Frias
Manuel Frias

Reputation: 311

Apply function to variable subset of columns

Consider this toy data frame:

df <- data.frame(id = c(1, 2),
             meandoy = c(3,2),
             temp199701 = c(4,2),
             temp199702 = c(15,10),
             temp199703 = c(-3,7),
             temp199704 = c(-1,6),
             temp199801 = c(1,5),
             temp199802 = c(9,10),
             temp199803 = c(-2,2),
             temp199804 = c(-5,11))

I want to add a new column with the result of a function for each year and each row. In other words, each new GDDyearcolumn gets the value of calculating from tempyear01to tempyear04.

I can achieve it with this:

sum.GDD <- function(x) sum(x[x > 5]-5, na.rm = TRUE)
    
yearlist <- c(1997, 1998)
        
for (year in yearlist){
      text <- paste("GDD",toString(year), sep = "")
      df[[text]] <- df %>%  #store result in this vector
        dplyr::select(contains(toString(year))) %>% #take variables that have year
        apply(1, sum.GDD) #calculate GDD5 across those variables
    }

But there is a twist. I want to apply the function only to the number of columns specified in meandoy each year.

For example, GDD1997 in the first row will be the result of calculating the first 3 columns starting from temp199701, because meandoy = 3. GDD1998 will get the result from temp199801, temp199802 and temp199803.

In the second row the meandoy = 2 so the result of GDD1997 will be calculated from temp199701 and temp199702. GDD1998 from temp199801 and temp199802.

Upvotes: 2

Views: 200

Answers (1)

Robin Gertenbach
Robin Gertenbach

Reputation: 10776

If in doubt a problem is usually made simpler by turning the data into a long format.

Since you're already using dplyr we can:

totals <- df %>%
  # Turn the dataframe into format id, meandoy, year, doy, value by parsing
  # the columns while unpivoting.
  pivot_longer(
    c(everything(), -id, -meandoy), 
    names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})", 
    names_transform = list(year = as.integer, doy = as.integer)
  ) %>%
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 

left_join(df, totals, by = "id")

This should be faster than an approach doing row-wise operations and/or loops.

Upvotes: 2

Related Questions