Reputation: 45
I am trying to parse data with RE. The data I have to parse are:
"comments":
{
[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
} ,
{ "id" : "009",
"x" : "7",
"name" : "Chuck"
}
]
}
Using urllib I copy the text above into a string, but I don't want all the text. I just want this:
[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
} ,
{ "id" : "009",
"x" : "7",
"name" : "Chuck"
}
]
I have tried using regular expressions but I think I am doing something wrong. My regular expression is:
y = re.findall("([.])", html)
I interpret it as finding all characters between [
and ]
and saving it in y
.
Upvotes: 0
Views: 93
Reputation: 49318
You'll need to escape the brackets with a backslash, and note the .
as repeating (but not including the closing bracket) with ?
. Use the re.DOTALL
flag to make .
include newlines. You can then send this string to ast.literal_eval()
to evaluate it:
import re
import ast
s = ''' "comments":
{
[
{ "id" : "001",
"x" : "2",
"name" : "Chuck"
} ,
{ "id" : "009",
"x" : "7",
"name" : "Chuck"
}
]
}'''
Result:
>>> ast.literal_eval(re.search(r'\[.*?\]', s, re.DOTALL).group(0))
[{'name': 'Chuck', 'x': '2', 'id': '001'}, {'name': 'Chuck', 'x': '7', 'id': '009'}]
Upvotes: 1
Reputation: 2999
\[[^\]]+\]
x[x.find('['):x.find(']')]
Upvotes: 2