Vuudi
Vuudi

Reputation: 94

How to properly escape reserved regex characters in JSON?

I have a JSON file that contains some regex expressions that I want to use in my python code. The problem arises when I try to escape reserved regex characters in the JSON file. When I run the python code, it can't process the json file and throws an exception.

I have already debugged the code and come to the conclusion, that it fails when calling json.loads(ruleFile.read()). Apparently only some characters can be escaped in JSON and the dot is not one of them which causes a syntax error.

try:
    with open(args.rules, "r") as ruleFile:
        rules = json.loads(ruleFile.read())
        for rule in rules:
            rules[rule] = re.compile(rules[rule])
except (IOError, ValueError) as e:
    raise Exception("Error reading rules file")
{
    "Rule 1": "www\.[a-z]{3,10}\.com"
}
Traceback (most recent call last):
  File "foo.py", line 375, in <module>
    main()
  File "foo.py", line 67, in main
    raise Exception("Error reading rules file")
Exception: Error reading rules file

How do I work around this JSON syntax problem?

Upvotes: 4

Views: 3136

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148965

The rule is to first have a correct string in a correct dictionary. And \ are to be escapes in Python.

So you should initially write:

rules = {"Rule 1": r"www\.[a-z]{3,10}\.com"}

You can then easily convert that to a JSON string:

print(json.dumps(rules, indent=4))

{
    "Rule 1": "www\\.[a-z]{3,10}\\.com"
}

You now know how the json file containing the regexes should be formatted.

Upvotes: 1

Robby Cornelissen
Robby Cornelissen

Reputation: 97152

The backslash needs to be escaped in JSON.

{
    "Rule 1": "www\\.[a-z]{3,10}\\.com"
}

From here:

The following characters are reserved in JSON and must be properly escaped to be used in strings:

  • Backspace is replaced with \b
  • Form feed is replaced with \f
  • Newline is replaced with \n
  • Carriage return is replaced with \r
  • Tab is replaced with \t
  • Double quote is replaced with \"
  • Backslash is replaced with \\

Upvotes: 1

Related Questions