Reputation: 507
I have a large data set with thousands of columns. The column names include various unwanted characters as follows:
col1_3x_xxx
col2_3y_xyz
col3_3z_zyx
I would like to remove all character strings starting with "_3" from all column names to be left with clean:
col1
col2
col3
What is the most efficient way to do this for 5000+ columns?
Upvotes: 25
Views: 82525
Reputation: 530
For someone looking for a pipeable solution (like I was), use dplyr::rename_with
and stringr::str_remove
:
tibble(
col1_3x_xxx = rnorm(3),
col2_3y_xyz = rnorm(3),
col3_3z_zyx = rnorm(3)
) %>%
rename_with(~ str_remove(., "_3.*"), everything())
# A tibble: 3 × 3
col1 col2 col3
<dbl> <dbl> <dbl>
1 0.819 0.674 2.06
2 0.597 0.554 -0.0586
3 0.490 0.878 0.708
Obs: for those who thought of rename_all()
, it has been superseded by rename_with()
(as well as rename_if()
, rename_at()
).
Upvotes: 3
Reputation:
You can use
names(df) = gsub(pattern = "_3.*", replacement = "", x = names(df))
Upvotes: 3
Reputation: 985
certainly late for this answer, but just in case someone is looking for a solution
colnames(df1)[col] <- sub("_3.*", "", colnames(df1)[col])
And if you have multiple columns :
for ( col in 1:ncol(df1)){
colnames(df1)[col] <- sub("_3.*", "", colnames(df1)[col])
}
Upvotes: 21
Reputation: 886938
We can use sub
sub("_3.*", "", df1[,1])
#[1] "col1" "col2" "col3"
Upvotes: 18
Reputation: 214927
We can try the str_extract
with regular expression pattern "^[^_]+(?=_)"
:
stringr::str_extract(c("col1_3x_xxx", "col2_3y_xyz", "col3_3z_zyx"), "^[^_]+(?=_)")
[1] "col1" "col2" "col3"
where in the pattern:
The first
^
matches the beginning of the string;[^_]+
matches one or more non_
character,^_
means any character but_
.(?=...)
stands for lookahead, so we are looking for pattern ahead of_
.
Upvotes: 5