mcheah
mcheah

Reputation: 1326

Negative lookahead in Ruby with preceding and following matches

I'm trying to parse an XML document (specifically a Sublime color theme) and I'm trying to use a negative lookahead to prevent a match that I don't want, but it doesn't appear to be working correctly.

the pattern is as follows:

/
<key>name<\/key>
.*?                     # find as little as possible including new lines
<string>(.*?)<\/string> # Match the name of this color Rule
.*?
<dict>
((?!<\/dict>).)*?       # After the second opening <dict>, do not allow a closing </dict>
<key>foreground<\/key>  
.*?
<string>(.*?)<\/string> # Match the hex code for the name found in Match 1.
/mx                     # Treat a newline as a character matched by .
                        # Ignore Whitespace, comments.

The string that is being matched is:

<dict>
        <key>name</key>
        <string>**Variable**</string>
        <key>scope</key>
        <string>variable</string>
        <key>settings</key>
        <dict>
            <key>fontStyle</key>
            <string></string>
        </dict>
    </dict>

    <dict>
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            <key>foreground</key>
            <string>**#F92672**</string>

The entire string is matched, with **Variable** as the first captured group and **#F92672** as the second. Ideally, I'd like for the first captured group to be Keyword in the second section. I assumed that the presence of the negative lookahead would mean that the first section would not be a part of the match because it would see the </dict> and not be able to match.

Does anyone know if I'm doing it wrong and what I can do to fix it? Thanks!

Upvotes: 0

Views: 288

Answers (2)

aristotll
aristotll

Reputation: 9177

The first dict with string **Variable** and second with Keyword have the same structure. And you want to distinct them by a negative lookahead, but that is not possible.

Changes ((?!<\/dict>).)*? to (((?!<\/dict>).)*?) to debug and you can see the new group content is

result="
        <key>name</key>
        <string>Keyword</string>
        <key>scope</key>
        <string>keyword - (source.c keyword.operator | source.c++ keyword.operator | source.objc keyword.operator | source.objc++ keyword.operator), keyword.operator.word</string>
        <key>settings</key>
        <dict>
            "

This satisfy your negative lookahead.

Even if you add more conditions(just using the structure as condition not contents ), because the same structure,**Variable** will be always before **#F92672**.

So using a xml parser maybe a better choice.

Upvotes: 0

Mark Thomas
Mark Thomas

Reputation: 37517

Here's a way to do it with Nokogiri:

require 'nokogiri'

theme = Nokogiri::XML.fragment(xml)
puts theme.xpath('./dict[1]/key[text()="name"]/following-sibling::string[1]').text
#=> "**Variable**"
puts theme.xpath('.//dict[preceding-sibling::key[1][text()="settings"]]/string').text
#=> "**#F92672**"

The first xpath takes the first dict and finds the key containing "name", then takes the text of the following string element.

The second XPath looks for a dict immediately after a key containing "settings", and retrieves the text of its string element.

Note that if you're parsing a full document rather than the given fragment you'll need to make a few changes, such as to change the call to theme = Nokogiri::XML.parse(xml) and remove the leading . from the XPath expressions.

Upvotes: 1

Related Questions