Reputation: 1142
I'm trying to parse some of my chess pgn data but I'm having some trouble capturing characters just inside one bracket.
testString <- '[Event \"?\"]\n[Site \"http://www.chessmaniac.com play free chess\"]\n[Date \"2018.08.25\"]\n[Round \"-\"]\n[White \"NothingFancy 1497\"]\n[Black \"JR Smith 1985\"]\n[Result \"1-0\"]\n\n1.'
#Attempt to just get who white is, which is inside a bracket [White xxx]
findWhite <- regexpr('\\[White.*\\]', tempString)
regmatches(tempString, findWhite)
The stringr package seems to do what I want, but I'm curious what is different about the use of the same regular expression. I'm fine using stringr, but I like to also know how to do this in base R.
library(stringr)
str_extract(tempString, '\\[White.*\\]')
Upvotes: 2
Views: 284
Reputation: 626728
If you need the whole match starting with [White
and ending with ]
you may use
regmatches(testString, regexpr("\\[White\\s*[^][]*]", testString))
[1] "[White \"NothingFancy 1497\"]"
If you only need the substring inside double quotes:
regmatches(testString, regexpr("\\[White\\s*\\K[^][]*", testString, perl=TRUE))
[1] "\"NothingFancy 1497\""
See the regex demo.
To strip the double quotes, you may use something like
regmatches(testString, regexpr('\\[White\\s*"\\K.*(?="])', testString, perl=TRUE))
[1] "NothingFancy 1497"
See another regex demo and an online R demo.
Details
\\[
- a [
charWhite
- a literal substring\\s*
- 0+ whitespaces\\K
- match reset operator discarding the text matched so far[^][]*
- 0+ chars other than [
and ]
.*
(in the other version) - matches any 0+ chars other than line break chars, as many as possible(?="])
- a positive lookahead that matches a position inside a string that is immediately followed with "]
.Upvotes: 3
Reputation: 37641
At least one way to do it in base R is to use sub
and only keep the part that you want.
sub(".*\\[White\\s(*.*?)\\].*", "\\1", testString)
[1] "\"NothingFancy 1497\""
Upvotes: 1