James Holland
James Holland

Reputation: 1142

R Capturing String inside Brackets

I'm trying to parse some of my chess pgn data but I'm having some trouble capturing characters just inside one bracket.

testString <- '[Event \"?\"]\n[Site \"http://www.chessmaniac.com play free chess\"]\n[Date \"2018.08.25\"]\n[Round \"-\"]\n[White \"NothingFancy 1497\"]\n[Black \"JR Smith 1985\"]\n[Result \"1-0\"]\n\n1.'

#Attempt to just get who white is, which is inside a bracket [White xxx]

findWhite <- regexpr('\\[White.*\\]', tempString)

regmatches(tempString, findWhite)

The stringr package seems to do what I want, but I'm curious what is different about the use of the same regular expression. I'm fine using stringr, but I like to also know how to do this in base R.

library(stringr)
str_extract(tempString, '\\[White.*\\]')

Upvotes: 2

Views: 284

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

If you need the whole match starting with [White and ending with ] you may use

regmatches(testString, regexpr("\\[White\\s*[^][]*]", testString))
[1] "[White \"NothingFancy 1497\"]"

If you only need the substring inside double quotes:

regmatches(testString, regexpr("\\[White\\s*\\K[^][]*", testString, perl=TRUE))
[1] "\"NothingFancy 1497\""

See the regex demo.

To strip the double quotes, you may use something like

regmatches(testString, regexpr('\\[White\\s*"\\K.*(?="])', testString, perl=TRUE))
[1] "NothingFancy 1497"

See another regex demo and an online R demo.

Details

  • \\[ - a [ char
  • White - a literal substring
  • \\s* - 0+ whitespaces
  • \\K - match reset operator discarding the text matched so far
  • [^][]* - 0+ chars other than [ and ]
  • .* (in the other version) - matches any 0+ chars other than line break chars, as many as possible
  • (?="]) - a positive lookahead that matches a position inside a string that is immediately followed with "].

Upvotes: 3

G5W
G5W

Reputation: 37641

At least one way to do it in base R is to use sub and only keep the part that you want.

sub(".*\\[White\\s(*.*?)\\].*", "\\1", testString)
[1] "\"NothingFancy 1497\""

Upvotes: 1

Related Questions