Reputation: 127
I'm having trouble with (1) using a dynamic variable in a regex pattern and (2) matching "\" or new line. I'd really appreciate any help!
Example: Ultimately, however possible, I'd like to match the word Administrator
in the text file below. The text file's data classification is character
(it was originally a list
and was coerced to character
using as.character()
. Here's the text snippet:
[1] "c(\"Silk Road Forums\", \"\", \"*\", \"Welcome, Guest. Please login or register.\", \"[ ] [ ] [Forever] [Login]\", \"Login with username, password and session length\", \"[ ] [Search] \", \"\", \" â\\200¢ Home\", \" â\\200¢ Search\", \" â\\200¢ Login\", \" â\\200¢ Register\", \"\", \"\", \" â\\200¢ Silk Road Forums »\", \" â\\200¢ Profile of Dread Pirate Roberts »\", \" â\\200¢ Summary\", \"\", \" â\\200¢ Profile Info\", \" â–¡ Summary\", \" â–¡ Show Stats\", \" â–¡ Show Posts...\", \" â\\230† Messages\", \n\" â\\230† Topics\", \" â\\230† Attachments\", \"\", \"[profile_sm]Summary\", \"\", \"Dread Pirate Roberts Administrator\", \"\", \"[index]\", \" â–¡ SMF | SMF © 2013, Simple Machines\"\n)"
Attempts / Problems
Tried to Match New Line: In that messy text (see above), I was able to match [profile_sm]Summary\
. I tried to match what comes next in that text by using:
\\n
-- failed\\n\\r
-- failed\\n|\\r
-- failed\\r\\n
-- failed\\r|\\n
-- failedIt seems like there's no new line after so I tried to match the literal ""," (inside quotation marks: quotation mark and comma) that comes after characters in that text. So I also tried these two and they both failed: \\
and \\"\"
.
Tried to Use Variable: I tried to use variable X
that includes Dread Pirate Roberts
from a previous regex match turned into a vector. I tried to just put X
into the regex pattern but it obviously didn't work. Is there away to create a pattern using X
? For example: Match one of the values found in x
.
I would need to know how to solve both of these problems / methods for other parts of my current project and would really love pointers and guidance. Thank you!
Edit Note: Saw that folks had trouble understanding this post so I edited to make it more legible. Thanks and shout-out to @Wiktor Stribiżew for reading through the original post despite the difficult wording and providing the answer! :)
Upvotes: 0
Views: 67
Reputation: 626870
Your text only contains two newlines, you can easily check it using cat(text)
and there are three lines:
c("Silk Road Forums", "", "*", "Welcome, Guest. Please login or register.", "[ ] [ ] [Forever] [Login]", "Login with username, password and session length", "[ ] [Search] ", "", " � Home", " � Search", " � Login", " � Register", "", "", " � Silk Road Forums »", " � Profile of Dread Pirate Roberts »", " � Summary", "", " � Profile Info", " □ Summary", " □ Show Stats", " □ Show Posts...", " � Messages",
" � Topics", " � Attachments", "", "[profile_sm]Summary", "", "Dread Pirate Roberts Administrator", "", "[index]", " □ SMF | SMF © 2013, Simple Machines"
)
So, as you see, there is no newline after [profile_sm]Summary
. Note to match [
in a regex pattern you need to escape it.. There is a space, "
and commas You may match these chars using [,"\s]+
pattern. The X
variable will hold Dread Pirate Roberts
, so, to extract Administrator
you may use
\[profile_sm]Summary[",\s]*Dread Pirate Roberts\s+\K[^"]+
See the regex demo.
Details
\[profile_sm]Summary
- [profile_sm]Summary
string[",\s]*
- 0+ "
, ,
or whitespace charsDread Pirate Roberts
- a literal string\s+
- 1+ whitespaces\K
- match reset operator that discards text matched so far in the match memory buffer[^"]+
- 1+ chars other than "
. If you need to only match letter, digits or _
you may use \w+
instead of this pattern (with \\
in the string literal).R demo:
text <- "c(\"Silk Road Forums\", \"\", \"*\", \"Welcome, Guest. Please login or register.\", \"[ ] [ ] [Forever] [Login]\", \"Login with username, password and session length\", \"[ ] [Search] \", \"\", \" â\200¢ Home\", \" â\200¢ Search\", \" â\200¢ Login\", \" â\200¢ Register\", \"\", \"\", \" â\200¢ Silk Road Forums »\", \" â\200¢ Profile of Dread Pirate Roberts »\", \" â\200¢ Summary\", \"\", \" â\200¢ Profile Info\", \" â–¡ Summary\", \" â–¡ Show Stats\", \" â–¡ Show Posts...\", \" â\230† Messages\", \n\" â\230† Topics\", \" â\230† Attachments\", \"\", \"[profile_sm]Summary\", \"\", \"Dread Pirate Roberts Administrator\", \"\", \"[index]\", \" â–¡ SMF | SMF © 2013, Simple Machines\"\n)"
X <- "Dread Pirate Roberts"
regex <- paste0('\\[profile_sm]Summary[",\\s]*',X,'\\s+\\K[^"]+')
regmatches(text, regexpr(regex, text, perl=TRUE))
## => [1] "Administrator"
Upvotes: 1