devin
devin

Reputation: 6527

Python remove JSON substring

If I have a string where there is a valid JSON substring like this one:

 mystr = '100{"1":2, "3":4}312'

What is the best way to do extract just the JSON string? The numbers outside can be anything (except a { or }), including newlines and things like that.

Just to be clear, this is the result I want

  newStr = '{"1":2, "3":4}'

The best way I can think of do this is to use find and rfind and then take the substring. This seems too verbose to me and it isn't python 3.0 compliant (which I would prefer but is not essential)

Any help is appreciated.

Upvotes: 4

Views: 1233

Answers (1)

Scott A
Scott A

Reputation: 7834

Note that the following code very much assumes that there is nothing other than non-bracket material on either side of the JSON string.

import re
matcher = re.compile(r"""
^[^\{]*          # Starting from the beginning of the string, match anything that isn't an opening bracket
       (         # Open a group to record what's next
        \{.+\}   # The JSON substring
       )         # close the group
 [^}]*$          # at the end of the string, anything that isn't a closing bracket
""", re.VERBOSE)

# Your example
print matcher.match('100{"1":2, "3":4}312').group(1)

# Example with embedded hashmap
print matcher.match('100{"1":{"a":"b", "c":"d"}, "3":4}312').group(1)

The short, non-precompiled, non-commented version:

import re
print re.match("^[^\{]*(\{[^\}]+\})[^}]*$", '100{"1":2, "3":4}312').group(1)

Although for the sake of maintenance, commenting regular expressions is very much preferred.

Upvotes: 6

Related Questions