motapekog
motapekog

Reputation: 13

Extract only the characters that are between opening and ending parantheses in the start and end of a string in R

I have many strings that all have the following format:

mystrings <- c(
  "(ABFUHIASH)THISISAVERYLONGSTRINGWITHOUTANYSPACES(ENDING)",
  "(SECONDSTR)YETANOTHERBORINGSTRINGWITHOUTSPACES(RANDOMENDING)", 
  "(JOWERIC)THISPARTSHOULDNOTBEEXTRACTED(GETTHIS)", 
  "(CAPTURETHIS)IOJSDOIOIADSNCXZZCX(IJFAI)"
)

I need to capture the strings that are inside parentheses both at the start and the end of the original mystrings.

Therefore, variable start will store the starting characters for each of the above strings with the same index. The result will be this:

start[1]
ABFUHIASH

start[2]
SECONDSTR

start[3]
JOWERIC

start[4]
CAPTURETHIS

And similarly, the ending for each string in mystrings will be saved into end:

end[1]
ENDING

end[2]
RANDOMENDING

end[3]
GETTHIS

end[4]
IJFAI

Parentheses themselves should NOT be captured.

Is there a way/function to do this quickly in R?

I have tried stringr::word and stringi::stri_extract, but I am getting very strange results.

Upvotes: 0

Views: 54

Answers (2)

MrFlick
MrFlick

Reputation: 206197

We can use the stringr library for this. For example

library(stringr)
mm <- str_match(mystrings, "^\\(([^)]+)\\).*\\(([^)]+)\\)$")
mm

The match finds the stuff between the parenthesis at the beginning and end of the string in capture groups so they can be easily extracted.

It returns a character matrix, and you seem to just want the 2nd and 3rd column. mm[,2:3]

     [,1]          [,2]          
[1,] "ABFUHIASH"   "ENDING"      
[2,] "SECONDSTR"   "RANDOMENDING"
[3,] "JOWERIC"     "GETTHIS"     
[4,] "CAPTURETHIS" "IJFAI"

Upvotes: 2

Brian Davis
Brian Davis

Reputation: 992

Something like this might work for you:

> regmatches(mystrings,gregexpr("\\(.+?\\)",mystrings))
[[1]]
[1] "(ABFUHIASH)" "(ENDING)"   

[[2]]
[1] "(SECONDSTR)"    "(RANDOMENDING)"

[[3]]
[1] "(JOWERIC)" "(GETTHIS)"

[[4]]
[1] "(CAPTURETHIS)" "(IJFAI)"

E.g., to extract endings you could:

lapply(x,tail,1)

Upvotes: 0

Related Questions