tophersmith116
tophersmith116

Reputation: 442

regex reject if found

I've already received some help here, but I'm having a slightly different problem. I'm looking to find cases where a DocumentBuilderFactory is created, but hasn't restricted the ExpandEntityReferences. I have the following regex:

(?x)

# finds DocumentBuilderFactory creation and pulls out the variable name
# of the form DocumentBuilderFactory VARNAME = DocumentBuilderFactory.newInstance
# then checks if that variable name has one of three acceptable ways to stop XXE attacks
# matches any instance where the variable is initialized, but not restricted

(?:
   # This is for DocumentBuilderFactory VARNAME = DocumentBuilderFactory.newInstance with many possible alternates
   DocumentBuilderFactory
   [\s]+?
   (\w+)
   [\s]*?
   =
   [\s]*?
   (?:.*?DocumentBuilderFactory)
   [.\s]+
   newInstance.*

   # checks that the var name is NOT (using ?!) using one of the acceptable rejection methods
   (?!\1[.\s]+
      (?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
        |setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
        |setExpandEntityReferences\s*\(\s*false\s*\))
   )
)

and a test file could look like this:

// Set the parser properties
  javax.xml.parsers.DocumentBuilderFactory factory = 
    javax.xml.parsers.DocumentBuilderFactory.newInstance();
  factory.setNamespaceAware(true);
  factory.setValidating(false);
  factory.setExpandEntityReferences(false);
  factory.setIgnoringComments(true);
  factory.setIgnoringElementContentWhitespace(true);
  factory.setCoalescing(true);
  javax.xml.parsers.DocumentBuilder builder = factory.newDocumentBuilder();

Is there any way to have this regex run on this file and the regex fails (because it correctly sets factory.setExpandEntityReferences(false); ?

Updated:

(?:
   DocumentBuilderFactory
   \s+
   (\w+)
   \s*
   =
   \s*
   (?:.*?DocumentBuilderFactory)
   \s*.\s*
   newInstance.*
   (?:[\s\S](?!
      \1\s*.\s*
      (?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
      |setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
      |setExpandEntityReferences\s*\(\s*false\s*\))
   ))*$
)

And it doesn't find() successfully, as expected; however, if I misspell factory.setExpandEntityReferences(false) as factory.setExpandEntity##References(false) I would expect the regex to be found, but it is not. Is there a way to get this feature to work?

Upvotes: 3

Views: 524

Answers (1)

Andrew Cheong
Andrew Cheong

Reputation: 30273

Test for a string not existing to the end:

(?:.(?!xyz))*$

It basically means, "Every single character from this point forth, must not be followed by xyz." Since . doesn't match newlines though, you might want to generalize it to:

(?:[\s\S](?!xyz))*$
   ^^^^^^

(It's union of complementary sets, therefore truly all characters.)

To apply this to your case, just replace xyz with the thing you don't want appearing anywhere:

   # checks that the var name is NOT (using ?!) using one of the acceptable rejection methods
   (?:[\s\S](?!
       \1[.\s]+
       (?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
         |setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
         |setExpandEntityReferences\s*\(\s*false\s*\))
   ))*$

Use word boundaries to match whole words (like identifiers):

Surely, when working with, say, factory, you wouldn't want to match old_factory! Use word boundaries to ensure you're capturing entire words.

In your case, just add a \b before the \1:

\b\1

Simplify your character classes and escape literal dots:

As mentioned in the comments, \s includes \r and \n, so you can rewrite [\s\r\n] as \s (without the brackets).

Also, you'd want to change instances like

newInstance.*

to

newInstance[.]*

Wildcards do not behave like \s or \w within a character class: . just means a literal dot within a character class.

Upvotes: 3

Related Questions