R Capturing String inside Brackets

Question

I'm trying to parse some of my chess pgn data but I'm having some trouble capturing characters just inside one bracket.

testString <- '[Event "?"]\n[Site "http://www.chessmaniac.com play free chess"]\n[Date "2018.08.25"]\n[Round "-"]\n[White "NothingFancy 1497"]\n[Black "JR Smith 1985"]\n[Result "1-0"]\n\n1.'

#Attempt to just get who white is, which is inside a bracket [White xxx]

findWhite <- regexpr('$$White.*$$', tempString)

regmatches(tempString, findWhite)

The stringr package seems to do what I want, but I'm curious what is different about the use of the same regular expression. I'm fine using stringr, but I like to also know how to do this in base R.

library(stringr)
str_extract(tempString, '$$White.*$$')

Wiktor Stribiżew · Accepted Answer

If you need the whole match starting with [White and ending with ] you may use

regmatches(testString, regexpr("\[White\s*[^][]*]", testString))
[1] "[White "NothingFancy 1497"]"

If you only need the substring inside double quotes:

regmatches(testString, regexpr("\[White\s*\K[^][]*", testString, perl=TRUE))
[1] ""NothingFancy 1497""

See the regex demo.

To strip the double quotes, you may use something like

regmatches(testString, regexpr('\[White\s*"\K.*(?="])', testString, perl=TRUE))
[1] "NothingFancy 1497"

See another regex demo and an online R demo.

Details

\[ - a [ char
White - a literal substring
\s* - 0+ whitespaces
\K - match reset operator discarding the text matched so far
[^][]* - 0+ chars other than [ and ]
.* (in the other version) - matches any 0+ chars other than line break chars, as many as possible
(?="]) - a positive lookahead that matches a position inside a string that is immediately followed with "].

R Capturing String inside Brackets

Answers (2)

Related Questions