Hmm
Hmm

Reputation: 103

remove the word in a string before first occurrence of hyphen or underscore in R

I am trying to remove a specific word in a string before first occurrence of hyphen or underscore.

below are example of input string I will receive

Sec_GHTY_WE
NewSec_JOL_ru
Sec-KIH-YRK
Sec_PWq-FTF
NewSec-LPO-WE

from above strings I want to remove 'Sec' or 'NewSec' from a string and prefix/replace with some other word like 'MYD'. or I can extract only the remaining string after first occurrence of hyphen or underscore. Expected output would be like below

MYD_GHTY_WE   (OR)   GHTY_WE
MYD_JOL_ru           JOL_ru
MYD-KIH-YRK          KIH-YRK
MYD_PWq-FTF          PWq-FTF
MYD-LPO-WE           LPO-WE

I have tried below code but it is not giving me expected output with paste method. but I have to use paste to give final output.

a <- Sec_GHTY_WE
paste0((a %>% gsub('\\bSec', "text1", ., ignore.case = TRUE) %>% gsub('\\bNewSec', "text2", ., ignore.case = TRUE)), "_", 1)

I need solution in base R.

Upvotes: 1

Views: 1554

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389155

If your dataframe is df and column is called V1.

Replace text before first underscore or hyphen :

df$V2 <- sub('.*?[-_]', 'MYD_', df$V1)
df
#             V1          V2
#1   Sec_GHTY_WE MYD_GHTY_WE
#2 NewSec_JOL_ru  MYD_JOL_ru
#3   Sec-KIH-YRK MYD_KIH-YRK
#4   Sec_PWq-FTF MYD_PWq-FTF
#5 NewSec-LPO-WE  MYD_LPO-WE

Replace either 'Sec' Or 'NewSec'.

df$V2 <- sub('Sec|NewSec', 'MYD', df$V1)

Upvotes: 2

Eric
Eric

Reputation: 2849

Here is a solution using the dplyr and stringr packages.

string_2 regular expression removes up until the first _ or - and is replaced with MYD.

string_3 regular expression removes through the first _ or -.

library(dplyr)
library(stringr)

df <- tibble(string_1 = c("Sec_GHTY_WE", "NewSec_JOL_ru", "Sec-KIH-YRK", "Sec_PWq-FTF", "NewSec-LPO-WE"))


df %>% 
  mutate(
    string_2 = str_replace(string_1, pattern = "^.*?(?=-|_)", "MYD"),
    string_3 = str_remove(string_1, pattern = "^.*?(_|-)")
    )

#> # A tibble: 5 x 3
#>   string_1      string_2    string_3
#>   <chr>         <chr>       <chr>   
#> 1 Sec_GHTY_WE   MYD_GHTY_WE GHTY_WE 
#> 2 NewSec_JOL_ru MYD_JOL_ru  JOL_ru  
#> 3 Sec-KIH-YRK   MYD-KIH-YRK KIH-YRK 
#> 4 Sec_PWq-FTF   MYD_PWq-FTF PWq-FTF 
#> 5 NewSec-LPO-WE MYD-LPO-WE  LPO-WE

Created on 2020-11-16 by the reprex package (v0.3.0)

Upvotes: 2

akrun
akrun

Reputation: 887541

We can use str_replace from stringr

library(stringr)
df$V2 <- str_replace(df$V1, 'Sec|NewSec', 'MYD')

Upvotes: 0

Related Questions