Reputation: 437
I'm trying to clean a dataset's names. I've used janitor::clean_names()
to start. However, I still have abbreviations that I would like to separate out with an underscore _
. I have code that works using rename_with(~str_replace(.x, "gh", "gh_"), .cols = starts_with("gh"))
, however there are many abbreviations and it would be good to find a way to map
or otherwise functionalize this process.
dat <- tibble(ghrisk_value = c(1,2),
ghrisk_corrected = c(2,3),
devpolicy_value = c(4,5),
devpolicy_corrected = c(5,6))
# code works but not functionalized
dat %>%
rename_with(~str_replace(.x, "gh", "gh_"), .cols = starts_with("gh")) %>%
rename_with(~str_replace(.x, "dev", "dev_"), .cols = starts_with("dev")) %>%
names()
# attempt at map...
abbr_words <- c("gh", "dev")
map(dat, ~rename_with(str_replace(.x, abbr_words, str_c(abbr_words, "_")))
Upvotes: 2
Views: 516
Reputation: 374
Using map you will need an assist function, which is real_func. map will work on colnames(dat), and will work with one colname at a time. Map requires a function which is real_func, the first param, which is the data param will go before the function, and the remaining param will go later. Repl_func will take column name one at a time and take the list of abbreviated words, loop over it and perform replacements. At end unlist, is required to return a flattened vector.
abbr_words <- c("gh", "dev")
repl_func <- function(x,y){
for (i in y){
x <- str_replace(x,i,paste0(i,"_"))
}
return (x)
}
colnames(dat) <- unlist(map(colnames(dat), repl_func, abbr_words))
Upvotes: 2
Reputation: 28675
You can reduce
over the words to replace with str_replace
abbr_words <- c("gh", "dev")
dat %>%
rename_all( ~
reduce(abbr_words, ~str_replace(.x, paste0('^', .y), paste0(.y, '_')), .init = names(dat))
)
# # A tibble: 2 x 4
# gh_risk_value gh_risk_corrected dev_policy_value dev_policy_corrected
# <dbl> <dbl> <dbl> <dbl>
# 1 1 2 4 5
# 2 2 3 5 6
Upvotes: 2
Reputation: 35554
You don't need map()
. Just use the regular expression syntax "(?<=a|b|c)"
, which matches the position behind a
or b
or c
and insert an underscore. In addition, starts_with()
can take a character vector as input to match the union of all elements.
abbr_words <- c("gh", "dev")
pattern <- sprintf("(?<=%s)", str_c(abbr_words, collapse = "|"))
# [1] "(?<=gh|dev)"
dat %>%
rename_with(~ str_replace(.x, pattern, "_"), starts_with(abbr_words))
# # A tibble: 2 x 4
# gh_risk_value gh_risk_corrected dev_policy_value dev_policy_corrected
# <dbl> <dbl> <dbl> <dbl>
# 1 1 2 4 5
# 2 2 3 5 6
Upvotes: 3