Reputation: 35
I try to remove in R, some characters unwanted from my column names (numbers, . and space) I have column names as follows
My data is tibble
tibble [33 x 38] (S3: tbl_df/tbl/data.frame)
$ year : chr [1:33] "1988" "1989" "1990" "1991" ...
$ VALOR AGREGADO BRUTO (a precios básicos) : num [1:33] 9906283 11624212 14163419 17400488 19785184 ...
$ 1. PRODUCTOS AGRÍCOLAS NO INDUSTRIALES : num [1:33] 831291 911372 1112167 1434213 1532067 ...
$ 2. PRODUCTOS AGRÍCOLAS INDUSTRIALES : num [1:33] 143426 214369 231168 341144 282777 ...
$ 3. COCA : num [1:33] 118273 153689 195108 190264 199259 ...
And I desired column names were.
tibble [33 x 38] (S3: tbl_df/tbl/data.frame)
$ year : chr [1:33] "1988" "1989" "1990" "1991" ...
$ VALOR AGREGADO BRUTO (a precios básicos) : num [1:33] 9906283 11624212 14163419 17400488 19785184 ...
$ PRODUCTOS AGRÍCOLAS NO INDUSTRIALES : num [1:33] 831291 911372 1112167 1434213 1532067 ...
$ PRODUCTOS AGRÍCOLAS INDUSTRIALES : num [1:33] 143426 214369 231168 341144 282777 ...
$ COCA : num [1:33] 118273 153689 195108 190264 199259 ...
I want remove number and . from colnames
colnames(data) <- sub("\\1:4\.\\", "", colnames(data))
colnames(data)
Please somebody could help me?
Best! Marcelo
Upvotes: 3
Views: 1354
Reputation: 5204
It's not clear what was wrong with the answers you got, but here's another try. Since you're showing a data.frame
and want to rename the columns, you can use the str_replace()
inside dplyr::rename_with()
. Also, since your data has 38
columns, I'm guessing you may need to remove numbers other than just 1-4
. To accommodate that I opened the range to all numbers by including [0-9]
and allowed either 1
or 2
digit numbers by indicating {1,2}
after the numeral specification.
library(tidyverse)
# took the column names you showed and added one with a higher number
d <- tibble(year = 1:5,
"VALOR AGREGADO BRUTO (a precios básicos)" = 1:5,
"1. PRODUCTOS AGRÍCOLAS NO INDUSTRIALES" = 1:5,
"2. PRODUCTOS AGRÍCOLAS INDUSTRIALES" = 1:5,
"3. COCA" = 1:5,
"29. OTHER" = 1:5)
# rename_with takes a renaming function
d %>%
rename_with(~str_remove(.x, "[0-9]{1,2}. "))
#> # A tibble: 5 x 6
#> year `VALOR AGREGADO BRUTO ~` `PRODUCTOS AGR~` `PRODUCTOS AGR~` COCA OTHER
#> <int> <int> <int> <int> <int> <int>
#> 1 1 1 1 1 1 1
#> 2 2 2 2 2 2 2
#> 3 3 3 3 3 3 3
#> 4 4 4 4 4 4 4
#> 5 5 5 5 5 5 5
Created on 2022-02-17 by the reprex package (v2.0.1)
Upvotes: 2
Reputation: 7106
We can use this pattern that reads, replace if it starts with one or more digit followed by a dot and a space.
library(stringr)
data <- c("1. PRODUCTOS AGRÍCOLAS NO INDUSTRIALES",
"2. PRODUCTOS AGRÍCOLAS INDUSTRIALES",
"3. SILVICULTURA, CAZA Y PESCA",
"4. PRODUCTOS PECUARIOS")
str_replace(data, '^\\d+\\. ', "")
#> [1] "PRODUCTOS AGRÍCOLAS NO INDUSTRIALES" "PRODUCTOS AGRÍCOLAS INDUSTRIALES"
#> [3] "SILVICULTURA, CAZA Y PESCA" "PRODUCTOS PECUARIOS"
Created on 2022-02-16 by the reprex package (v2.0.1)
Upvotes: 0