Reputation: 6527
If I have a string where there is a valid JSON substring like this one:
mystr = '100{"1":2, "3":4}312'
What is the best way to do extract just the JSON string? The numbers outside can be anything (except a {
or }
), including newlines and things like that.
Just to be clear, this is the result I want
newStr = '{"1":2, "3":4}'
The best way I can think of do this is to use find
and rfind
and then take the substring. This seems too verbose to me and it isn't python 3.0 compliant (which I would prefer but is not essential)
Any help is appreciated.
Upvotes: 4
Views: 1233
Reputation: 7834
Note that the following code very much assumes that there is nothing other than non-bracket material on either side of the JSON string.
import re
matcher = re.compile(r"""
^[^\{]* # Starting from the beginning of the string, match anything that isn't an opening bracket
( # Open a group to record what's next
\{.+\} # The JSON substring
) # close the group
[^}]*$ # at the end of the string, anything that isn't a closing bracket
""", re.VERBOSE)
# Your example
print matcher.match('100{"1":2, "3":4}312').group(1)
# Example with embedded hashmap
print matcher.match('100{"1":{"a":"b", "c":"d"}, "3":4}312').group(1)
The short, non-precompiled, non-commented version:
import re
print re.match("^[^\{]*(\{[^\}]+\})[^}]*$", '100{"1":2, "3":4}312').group(1)
Although for the sake of maintenance, commenting regular expressions is very much preferred.
Upvotes: 6