nagpal826
nagpal826

Reputation: 67

regex strsplit expression in R so it only applies once to the first occurrence of a specific character in each string?

I have a list filled with strings: string<- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L")

I need to split the strings so they appear like:

"SPG_L", "subgenual_ACC_R", "SPG_R", "MTG_L_pole", "MTG_L_pole", "CerebellumGM_L"

I tried using the following regex expression to split the strings:

str_split(string,'(?<=[[RL]|pole])_')

But this leads to:

"SPG_L", "subgenual" "ACC_R", "SPG_R", "MTG_L", "pole", "MTG_L", "pole", "CerebellumGM_L"

How do I edit the regex expression so it splits each string element at the "_" after the first occurrence of "R", "L" unless the first occurrence of "R" or "L" is followed by "pole", then it splits the string element after the first occurrence of "pole" and only splits each string element once?

Upvotes: 2

Views: 202

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

I suggest a matching approach using

^(.*?[RL](?:_pole)?)_(.*)

See the regex demo

Details

  • ^ - start of string
  • (.*?[RL](?:_pole)?) - Group 1:
    • .*? - any zero or more chars other than line break chars as few as possible
    • [RL](?:_pole)? - R or L optionally followed with _pole
  • _ - an underscore
  • (.*) - Group 2: any zero or more chars other than line break chars as many as possible

See the R demo:

library(stringr)
x <- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L", "SFG_pole_R_IFG_triangularis_L", "SFG_pole_R_IFG_opercularis_L" )

res <- str_match_all(x, "^(.*?[RL](?:_pole)?)_(.*)")
lapply(res, function(x) x[-1])

Output:

[[1]]
[1] "SPG_L"           "subgenual_ACC_R"

[[2]]
[1] "SPG_R"      "MTG_L_pole"

[[3]]
[1] "MTG_L_pole"     "CerebellumGM_L"

[[4]]
[1] "SFG_pole_R"         "IFG_triangularis_L"

[[5]]
[1] "SFG_pole_R"        "IFG_opercularis_L"

Upvotes: 2

Onyambu
Onyambu

Reputation: 79208

you could use sub then strsplit as shown:

strsplit(sub("^.*?[LR](?:_pole)?\\K_",":",string,perl=TRUE),":")
[[1]]
[1] "SPG_L"           "subgenual_ACC_R"

[[2]]
[1] "SPG_R"      "MTG_L_pole"

[[3]]
[1] "MTG_L_pole"     "CerebellumGM_L"

Upvotes: 0

svenhalvorson
svenhalvorson

Reputation: 1080

split_again = function(x){
  if(length(x) > 1){
    return(x)
  }
  else{
    str_split(
      string = x,
      pattern = '(?<=[R|L])_', 
      n = 2)
  }
}
str_split(
  string = string,
  pattern = '(?<=pole)_', 
  n = 2) %>% 
  lapply(split_again) %>% 
  unlist()

Upvotes: 0

Related Questions