dano_
dano_

Reputation: 323

R rename all columns with regex

I messy column names that have the following format: column name is in English, followed by a slash(/), followed by a same word in French with the year. for example

CSD Code / Code de la SDR 2011, Education / Scolarité 2011, Labour Force Activity / Activité sur le marché du travail 2011

is there a tidyverse friendly solution that will let me rename all the columns by removing everything after the slash(/) but keep the year. for example: CSD Code 2011, Education 2011, Labour Force Activity 2011

Upvotes: 0

Views: 590

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

You can extract the part of the string that you want to keep with capture groups.

library(dplyr)

df %>% rename_with(~sub('(.*)/.*?(\\d+)', '\\1\\2', .))

Using data from @MrFlick -

sub('(.*)/.*?(\\d+)', '\\1\\2', x)
#[1] "CSD Code 2011"              "Education 2011"            
#[3] "Labour Force Activity 2011"

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

You can use a regular expression. With the sample data:

x <- c("CSD Code / Code de la SDR 2011", 
        "Education / Scolarité 2011", 
        "Labour Force Activity / Activité sur le marché du travail 2011")

You can use the tidyverse package stringr and get

stringr::str_replace(x, " / \\D*(?= \\d+$)", "")
# [1] "CSD Code 2011"             
# [2] "Education 2011"            
# [3] "Labour Force Activity 2011"

The expression looks for a space and a slash and removes all the non-digit characters afterward leaving just the digits at the end.

You can use that with the dplyr::rename_with for column names

my_data %>% 
  rename_with(~stringr::str_replace(., " / \\D*(?= \\d+$)", ""))

Upvotes: 2

Related Questions