Reputation: 33
I'm a bit new to R and trying to find a simplified way of creating multiple columns based on a formula.
I have a dataset that has a base date followed by scores that were taken weekly (score1 = score from 1 week after base date). I would like to generate a date for each week i.e. adding X*7 to the base date. I have found a way to do this by simply creating each date variable one at a time (see below) but since I have over 500 scores, I was wondering if there is a simplified way of doing this that does not take up hundreds of lines of code.
Dataset$score1_date <- Dataset$base_date + (1*7)
Dataset$score2_date <- Dataset$base_date + (2*7)
Dataset$score3_date <- Dataset$base_date + (3*7)
Here is an example dataset:
Dataset <- structure(list(id = c(1, 2, 3), base_date = structure(c(18628, 18633, 18641), class = "Date"), score1 = c(4, 5, 5), score2 = c(6, 5, 2), score3 = c(5, 5, 1)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
Thank you!
Upvotes: 3
Views: 987
Reputation: 89
You can try using a for loop and indicating a column of a data.frame
using double brackets (i.e. [[.]]
). For example:
for (i in c(1:500)){
Dataset[[paste0("score", i, "_date")]] <- Dataset$base_date + (i*7)
}
Upvotes: 1
Reputation: 887028
We can use lapply
to loop over the multiplier index i.e 1:3 in the OP's post, multiply by 7 and add to base_date, then assign the list
of vector
s to new columns by paste
ing the 'score' with the index and '_date'
Dataset[paste0('score', 1:3, '_date')] <- lapply(1:3,
function(i) Dataset$base_date + i*7)
Or using dplyr
, loop across
the 'score' columns, extract the numeric part from the column name (cur_column()
) with parse_number
, multiply by 7 and add to 'base_date' while modifying the column names in .names
by adding the '_date' to create new columns
library(dplyr)
Dataset <- Dataset %>%
mutate(across(starts_with('score'), ~ base_date +
(readr::parse_number(cur_column())) * 7, .names = '{.col}_date'))
-output
Dataset
# A tibble: 3 x 8
# id base_date score1 score2 score3 score1_date score2_date score3_date
# <dbl> <date> <dbl> <dbl> <dbl> <date> <date> <date>
#1 1 2021-01-01 4 6 5 2021-01-08 2021-01-15 2021-01-22
#2 2 2021-01-06 5 5 5 2021-01-13 2021-01-20 2021-01-27
#3 3 2021-01-14 5 2 1 2021-01-21 2021-01-28 2021-02-04
Upvotes: 2