Reputation: 323
I messy column names that have the following format: column name is in English, followed by a slash(/), followed by a same word in French with the year. for example
CSD Code / Code de la SDR 2011
,
Education / Scolarité 2011
,
Labour Force Activity / Activité sur le marché du travail 2011
is there a tidyverse friendly solution that will let me rename all the columns by removing everything after the slash(/) but keep the year. for example:
CSD Code 2011
,
Education 2011
,
Labour Force Activity 2011
Upvotes: 0
Views: 590
Reputation: 388862
You can extract the part of the string that you want to keep with capture groups.
library(dplyr)
df %>% rename_with(~sub('(.*)/.*?(\\d+)', '\\1\\2', .))
Using data from @MrFlick -
sub('(.*)/.*?(\\d+)', '\\1\\2', x)
#[1] "CSD Code 2011" "Education 2011"
#[3] "Labour Force Activity 2011"
Upvotes: 1
Reputation: 206197
You can use a regular expression. With the sample data:
x <- c("CSD Code / Code de la SDR 2011",
"Education / Scolarité 2011",
"Labour Force Activity / Activité sur le marché du travail 2011")
You can use the tidyverse package stringr
and get
stringr::str_replace(x, " / \\D*(?= \\d+$)", "")
# [1] "CSD Code 2011"
# [2] "Education 2011"
# [3] "Labour Force Activity 2011"
The expression looks for a space and a slash and removes all the non-digit characters afterward leaving just the digits at the end.
You can use that with the dplyr::rename_with
for column names
my_data %>%
rename_with(~stringr::str_replace(., " / \\D*(?= \\d+$)", ""))
Upvotes: 2