Reputation: 442
I've already received some help here, but I'm having a slightly different problem. I'm looking to find cases where a DocumentBuilderFactory
is created, but hasn't restricted the ExpandEntityReferences
. I have the following regex:
(?x)
# finds DocumentBuilderFactory creation and pulls out the variable name
# of the form DocumentBuilderFactory VARNAME = DocumentBuilderFactory.newInstance
# then checks if that variable name has one of three acceptable ways to stop XXE attacks
# matches any instance where the variable is initialized, but not restricted
(?:
# This is for DocumentBuilderFactory VARNAME = DocumentBuilderFactory.newInstance with many possible alternates
DocumentBuilderFactory
[\s]+?
(\w+)
[\s]*?
=
[\s]*?
(?:.*?DocumentBuilderFactory)
[.\s]+
newInstance.*
# checks that the var name is NOT (using ?!) using one of the acceptable rejection methods
(?!\1[.\s]+
(?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
|setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
|setExpandEntityReferences\s*\(\s*false\s*\))
)
)
and a test file could look like this:
// Set the parser properties
javax.xml.parsers.DocumentBuilderFactory factory =
javax.xml.parsers.DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(false);
factory.setExpandEntityReferences(false);
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);
factory.setCoalescing(true);
javax.xml.parsers.DocumentBuilder builder = factory.newDocumentBuilder();
Is there any way to have this regex run on this file and the regex fails (because it correctly sets factory.setExpandEntityReferences(false);
?
Updated:
(?:
DocumentBuilderFactory
\s+
(\w+)
\s*
=
\s*
(?:.*?DocumentBuilderFactory)
\s*.\s*
newInstance.*
(?:[\s\S](?!
\1\s*.\s*
(?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
|setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
|setExpandEntityReferences\s*\(\s*false\s*\))
))*$
)
And it doesn't find() successfully, as expected; however, if I misspell factory.setExpandEntityReferences(false) as factory.setExpandEntity##References(false) I would expect the regex to be found, but it is not. Is there a way to get this feature to work?
Upvotes: 3
Views: 524
Reputation: 30273
(?:.(?!xyz))*$
It basically means, "Every single character from this point forth, must not be followed by xyz
." Since .
doesn't match newlines though, you might want to generalize it to:
(?:[\s\S](?!xyz))*$
^^^^^^
(It's union of complementary sets, therefore truly all characters.)
To apply this to your case, just replace xyz
with the thing you don't want appearing anywhere:
# checks that the var name is NOT (using ?!) using one of the acceptable rejection methods
(?:[\s\S](?!
\1[.\s]+
(?:setFeature\s*\(\s*"http://xml.org/sax/features/external-general-entities"\s*,\s*false\s*\)
|setFeature\s*\(\s*"http://apache.org/xml/features/disallow-doctype-decl"\s*,\s*false\s*\)
|setExpandEntityReferences\s*\(\s*false\s*\))
))*$
Surely, when working with, say, factory
, you wouldn't want to match old_factory
! Use word boundaries to ensure you're capturing entire words.
In your case, just add a \b
before the \1
:
\b\1
As mentioned in the comments, \s
includes \r
and \n
, so you can rewrite [\s\r\n]
as \s
(without the brackets).
Also, you'd want to change instances like
newInstance.*
to
newInstance[.]*
Wildcards do not behave like \s
or \w
within a character class: .
just means a literal dot within a character class.
Upvotes: 3