Reputation: 359
Problem:
I want to rename a large number of column names by replacing certain repeated strings.
Reprex:
library(dplyr)
library(stringr)
code <- c(round(runif(26, 0, 100),0))
names <- letters
AIYN <- stringi::stri_rand_strings(26, 2)
SIYN <- stringi::stri_rand_strings(26, 2)
df <- bind_cols(code, names, AIYN, SIYN)
colnames(df) <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
View(df)
Attempted Solution
colnames(df) <- str_replace_all(colnames(df), "[(2021)]", "")
colnames(df) <- str_replace_all(colnames(df), "all the info you need (AIYN) from A to Z", "AIYN")
colnames(df) <- str_replace_all(colnames(df), "some info you need (SIYN) from A to Z", "SIYN")
Goal
I want to remove brackets with numbers in them (e.g. "(2019)"), and keep the characters in the brackets with only characters in them (e.g. "(AIYN)", "(SIYN)"). My solution is long-winded as my dataframe has over a hundred columns.
Upvotes: 1
Views: 221
Reputation: 626738
To remove brackets with numbers you need
stringr::str_replace_all(colnames(df), "\\s*\\(\\d+\\)", "")
stringr::str_remove_all(colnames(df), "\\s*\\(\\d+\\)")
gsub("\\s*\\(\\d+\\)", "", colnames(df))
If the numbers inside parentheses must consist of 4 digits, replace \d+
with \d{4}
.
Put the above code inside trimws(...)
to stirp leading/trailing whitespace.
See the regex demo.
To keep the first letter-only value inside parentheses you need
stringr::str_extract(colnames(df), '(?<=\\()[A-Za-z]+(?=\\))') # ASCII only
stringr::str_extract(colnames(df), '(?<=\\()\\p{L}+(?=\\))') # Any Unicode
Combining both:
colnames(df) <- coalesce(str_extract(colnames(df), '(?<=\\()[A-Za-z]+(?=\\))'), str_replace_all(colnames(df), "\\s*\\(\\d+\\)", ""))
R test
library(dplyr)
library(stringr)
x <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
z <- str_replace_all(x, "\\s*\\(\\d+\\)", "")
# => [1] "code" "names" "all the info you need (AIYN) from A to Z" [4] "some info you need (SIYN) from A to Z"
y <- str_extract(z, '(?<=\\()[A-Za-z]+(?=\\))')
# => [1] NA NA "AIYN" "SIYN"
coalesce(y, z)
# => "code" "names" "AIYN" "SIYN"
Upvotes: 1
Reputation: 388862
You can try -
library(magrittr)
names(df) <- sub('\\s\\(\\d+\\)', '', names(df)) %>%
sub('.*\\(([A-Z]+)\\).*', '\\1', .)
names(df)
#[1] "code" "names" "AIYN" "SIYN"
The first sub
drops the a number inside a parenthesis along with whitespaces.
The second sub
extracts more than one [A-Z]
values inside parenthesis.
To use this with dplyr
and pipes -
library(dplyr)
df %>%
rename_with(~sub('\\s\\(\\d+\\)', '', .) %>%
sub('.*\\(([A-Z]+)\\).*', '\\1', .))
# code names AIYN SIYN
# <dbl> <chr> <chr> <chr>
# 1 1 a 1A NR
# 2 96 b Dq hi
# 3 46 c 28 AQ
# 4 78 d Y8 xH
# 5 76 e ps ES
# 6 56 f m5 gQ
# 7 51 g vV 8u
# 8 72 h Hw JV
# 9 24 i 0T 7A
#10 76 j mq Qy
# … with 16 more rows
Upvotes: 1