Reputation: 388
I need to rename several columns which names have a string pattern. Let's use this dataframe as example.
library(tidyverse, tibble)
df = as.tibble(matrix(0, nrow = 3, ncol = 30))
colnames(df) = c("p1", "BNT2", "BNT3", "BNT4","BNT5","BNT6","BNT7","BNT8","BNT9","BNT10",
"BNT11","BNT12","BNT13","BNT14" ,"BNT15", "groupTime186", "groupTime187", "groupTime188", "groupTime189", "groupTime190", "groupTime191",
"groupTime192", "groupTime193", "groupTime194", "groupTime195" ,"groupTime196", "groupTime197",
"groupTime198", "groupTime199", "groupTime200")
# A tibble: 3 x 30
p1 BNT2 BNT3 BNT4 BNT5 BNT6 BNT7 BNT8 BNT9 BNT10 BNT11 BNT12 BNT13 BNT14 BNT15 groupTime186 groupTime187 groupTime188
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# ... with 12 more variables: groupTime189 <dbl>, groupTime190 <dbl>, groupTime191 <dbl>, groupTime192 <dbl>, groupTime193 <dbl>,
# groupTime194 <dbl>, groupTime195 <dbl>, groupTime196 <dbl>, groupTime197 <dbl>, groupTime198 <dbl>, groupTime199 <dbl>,
# groupTime200 <dbl>
Normally I would use gsub
and set_names
to capture the item number and to construct the new name. Like this:
df %>%
set_names(gsub("p([0-9]{1,2})|BNT([0-9]{1,2})", "BOS_\\1\\2_cod", names(.)))
With this I can re-use the correlative numbers from the original names. The problem is that, because of the software we use to export responses, time-columns usually have a numeration that does not start from 01, so I can't re-use the numeration. Instead, I have to select only the time-columns and use colnames
and paste0
to construct the names, and then rejoin the time-columns. Like this:
colnames(df) = paste0("BOS_", sprintf("%02d", 1:15), "_time")
I don't believe this is a good way to approach this task because requires more steps and it is not embedded in the original piped code that renames the answer-columns.
My question is: How can I select the columns to be renamed and feed them with a vector that contains the new names? Or alternatively, can I use a sequence, like sprintf("%02d", 1:15)
, so gsub
replace the first column with the first term of the sequence? Ideally, I want a solution that can be embedded in a piped code (dplyr
).
UPDATE: The expected output is the same dataframe but named in this way:
[1] "BOS_01_raw" "BOS_02_raw" "BOS_03_raw" "BOS_04_raw" "BOS_05_raw" "BOS_06_raw" "BOS_07_raw" "BOS_08_raw" "BOS_09_raw" "BOS_10_raw"
[11] "BOS_11_raw" "BOS_12_raw" "BOS_13_raw" "BOS_14_raw" "BOS_15_raw" "BOS_01_time" "BOS_02_time" "BOS_03_time" "BOS_04_time" "BOS_05_time"
[21] "BOS_06_time" "BOS_07_time" "BOS_08_time" "BOS_09_time" "BOS_10_time" "BOS_11_time" "BOS_12_time" "BOS_13_time" "BOS_14_time" "BOS_15_time"
As I said before, I can rename the BNT items because they already are numerated, but the groupTime columns are the problem.
Upvotes: 2
Views: 2955
Reputation: 388
I managed to solve the problem thanks to @mt1022 comment. According to How to rename multiple columns given character vectors of column names and replacement in dplyr 0.6.0?:
First a vector with the new names have to be created.
names_boston = c(paste0("BOS_", sprintf("%02d", 1:31), "_time"))
Then the columns can be selected using grep
, and the new names can be feed to rename_at
.
df %>%
rename_at(vars(grep("Time", names(.))), ~names_boston)
And to avoid creating new vectors you can actually feed the vector to the previous line of code:
df %>%
rename_at(vars(grep("Time", names(.))), ~c(paste0("BOS_", sprintf("%02d", 1:31), "_time")))
Upvotes: 4