Reputation: 1326
I'm trying to parse an XML document (specifically a Sublime color theme) and I'm trying to use a negative lookahead to prevent a match that I don't want, but it doesn't appear to be working correctly.
the pattern is as follows:
/
<key>name<\/key>
.*? # find as little as possible including new lines
<string>(.*?)<\/string> # Match the name of this color Rule
.*?
<dict>
((?!<\/dict>).)*? # After the second opening <dict>, do not allow a closing </dict>
<key>foreground<\/key>
.*?
<string>(.*?)<\/string> # Match the hex code for the name found in Match 1.
/mx # Treat a newline as a character matched by .
# Ignore Whitespace, comments.
The string that is being matched is:
<dict>
<key>name</key>
<string>**Variable**</string>
<key>scope</key>
<string>variable</string>
<key>settings</key>
<dict>
<key>fontStyle</key>
<string></string>
</dict>
</dict>
<dict>
<key>name</key>
<string>Keyword</string>
<key>scope</key>
<string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
<key>settings</key>
<dict>
<key>foreground</key>
<string>**#F92672**</string>
The entire string is matched, with **Variable**
as the first captured group and **#F92672**
as the second. Ideally, I'd like for the first captured group to be Keyword
in the second section. I assumed that the presence of the negative lookahead would mean that the first section would not be a part of the match because it would see the </dict>
and not be able to match.
Does anyone know if I'm doing it wrong and what I can do to fix it? Thanks!
Upvotes: 0
Views: 288
Reputation: 9177
The first dict
with string **Variable**
and second with Keyword
have the same structure. And you want to distinct them by a negative lookahead, but that is not possible.
Changes ((?!<\/dict>).)*?
to (((?!<\/dict>).)*?)
to debug
and you can see the new group content is
result="
<key>name</key>
<string>Keyword</string>
<key>scope</key>
<string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
<key>settings</key>
<dict>
"
This satisfy your negative lookahead.
Even if you add more conditions(just using the structure as condition not contents ), because the same structure,**Variable**
will be always before **#F92672**
.
So using a xml parser maybe a better choice.
Upvotes: 0
Reputation: 37517
Here's a way to do it with Nokogiri:
require 'nokogiri'
theme = Nokogiri::XML.fragment(xml)
puts theme.xpath('./dict[1]/key[text()="name"]/following-sibling::string[1]').text
#=> "**Variable**"
puts theme.xpath('.//dict[preceding-sibling::key[1][text()="settings"]]/string').text
#=> "**#F92672**"
The first xpath takes the first dict
and finds the key
containing "name", then takes the text of the following string
element.
The second XPath looks for a dict
immediately after a key
containing "settings", and retrieves the text of its string
element.
Note that if you're parsing a full document rather than the given fragment you'll need to make a few changes, such as to change the call to theme = Nokogiri::XML.parse(xml)
and remove the leading .
from the XPath expressions.
Upvotes: 1