Reputation: 83
I have a dataset like below, how can I remove the '#number'?
df>
terms year
5;#Remote Production;#10; 2021
53;#=Product-Category:Routing 2021
30;#HDR;#5;#Remote Production 2020
...
I need it to be like this:
df>
terms year
#Remote Production 2021
#Product-Category:Routing 2021
#HDR;#Remote Production 2020
...
The number at the beginning without the # also needs to be removed
Upvotes: 0
Views: 391
Reputation: 887951
An option with str_remove
library(stringr)
library(dplyr)
df %>%
mutate(terms = str_c('#', str_remove_all(terms, "^\\d+;#\\=?|#\\d+;")))
-output
# terms year
#1 #Remote Production; 2021
#2 #Product-Category:Routing 2021
#3 #HDR;#Remote Production 2020
df <- structure(list(terms = c("5;#Remote Production;#10;", "53;#=Product-Category:Routing",
"30;#HDR;#5;#Remote Production"), year = c(2021L, 2021L, 2020L
)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 4