Reputation: 311
Consider this toy data frame:
df <- data.frame(id = c(1, 2),
meandoy = c(3,2),
temp199701 = c(4,2),
temp199702 = c(15,10),
temp199703 = c(-3,7),
temp199704 = c(-1,6),
temp199801 = c(1,5),
temp199802 = c(9,10),
temp199803 = c(-2,2),
temp199804 = c(-5,11))
I want to add a new column with the result of a function for each year and each row. In other words, each new GDDyear
column gets the value of calculating from tempyear01
to tempyear04
.
I can achieve it with this:
sum.GDD <- function(x) sum(x[x > 5]-5, na.rm = TRUE)
yearlist <- c(1997, 1998)
for (year in yearlist){
text <- paste("GDD",toString(year), sep = "")
df[[text]] <- df %>% #store result in this vector
dplyr::select(contains(toString(year))) %>% #take variables that have year
apply(1, sum.GDD) #calculate GDD5 across those variables
}
But there is a twist. I want to apply the function only to the number of columns specified in meandoy
each year.
For example, GDD1997
in the first row will be the result of calculating the first 3 columns starting from temp199701
, because meandoy = 3
. GDD1998
will get the result from temp199801
, temp199802
and temp199803
.
In the second row the meandoy = 2
so the result of GDD1997
will be calculated from temp199701
and temp199702
. GDD1998
from temp199801
and temp199802
.
Upvotes: 2
Views: 200
Reputation: 10776
If in doubt a problem is usually made simpler by turning the data into a long format.
Since you're already using dplyr
we can:
totals <- df %>%
# Turn the dataframe into format id, meandoy, year, doy, value by parsing
# the columns while unpivoting.
pivot_longer(
c(everything(), -id, -meandoy),
names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})",
names_transform = list(year = as.integer, doy = as.integer)
) %>%
# Selects all columns (in the original df) from year01 to year<meandoy>.
filter(doy <= meandoy) %>%
# Calculate the GDD
group_by(id, year) %>%
summarize(total = sum.GDD(value), .groups = "drop") %>%
# Back to the original format.
pivot_wider(names_from = year, values_from = total, names_prefix = "GDD")
# Selects all columns (in the original df) from year01 to year<meandoy>.
filter(doy <= meandoy) %>%
# Calculate the GDD
group_by(id, year) %>%
summarize(total = sum.GDD(value), .groups = "drop") %>%
# Back to the original format.
pivot_wider(names_from = year, values_from = total, names_prefix = "GDD")
left_join(df, totals, by = "id")
This should be faster than an approach doing row-wise operations and/or loops.
Upvotes: 2