Reputation: 103
I am trying to remove a specific word in a string before first occurrence of hyphen or underscore.
below are example of input string I will receive
Sec_GHTY_WE
NewSec_JOL_ru
Sec-KIH-YRK
Sec_PWq-FTF
NewSec-LPO-WE
from above strings I want to remove 'Sec' or 'NewSec' from a string and prefix/replace with some other word like 'MYD'. or I can extract only the remaining string after first occurrence of hyphen or underscore. Expected output would be like below
MYD_GHTY_WE (OR) GHTY_WE
MYD_JOL_ru JOL_ru
MYD-KIH-YRK KIH-YRK
MYD_PWq-FTF PWq-FTF
MYD-LPO-WE LPO-WE
I have tried below code but it is not giving me expected output with paste method. but I have to use paste to give final output.
a <- Sec_GHTY_WE
paste0((a %>% gsub('\\bSec', "text1", ., ignore.case = TRUE) %>% gsub('\\bNewSec', "text2", ., ignore.case = TRUE)), "_", 1)
I need solution in base R.
Upvotes: 1
Views: 1554
Reputation: 389155
If your dataframe is df
and column is called V1
.
Replace text before first underscore or hyphen :
df$V2 <- sub('.*?[-_]', 'MYD_', df$V1)
df
# V1 V2
#1 Sec_GHTY_WE MYD_GHTY_WE
#2 NewSec_JOL_ru MYD_JOL_ru
#3 Sec-KIH-YRK MYD_KIH-YRK
#4 Sec_PWq-FTF MYD_PWq-FTF
#5 NewSec-LPO-WE MYD_LPO-WE
Replace either 'Sec'
Or 'NewSec'
.
df$V2 <- sub('Sec|NewSec', 'MYD', df$V1)
Upvotes: 2
Reputation: 2849
Here is a solution using the dplyr
and stringr
packages.
string_2
regular expression
removes up until the first _
or -
and is replaced with MYD
.
string_3
regular expression
removes through the first _
or -
.
library(dplyr)
library(stringr)
df <- tibble(string_1 = c("Sec_GHTY_WE", "NewSec_JOL_ru", "Sec-KIH-YRK", "Sec_PWq-FTF", "NewSec-LPO-WE"))
df %>%
mutate(
string_2 = str_replace(string_1, pattern = "^.*?(?=-|_)", "MYD"),
string_3 = str_remove(string_1, pattern = "^.*?(_|-)")
)
#> # A tibble: 5 x 3
#> string_1 string_2 string_3
#> <chr> <chr> <chr>
#> 1 Sec_GHTY_WE MYD_GHTY_WE GHTY_WE
#> 2 NewSec_JOL_ru MYD_JOL_ru JOL_ru
#> 3 Sec-KIH-YRK MYD-KIH-YRK KIH-YRK
#> 4 Sec_PWq-FTF MYD_PWq-FTF PWq-FTF
#> 5 NewSec-LPO-WE MYD-LPO-WE LPO-WE
Created on 2020-11-16 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 887541
We can use str_replace
from stringr
library(stringr)
df$V2 <- str_replace(df$V1, 'Sec|NewSec', 'MYD')
Upvotes: 0