Pablo
Pablo

Reputation: 45

Parsing list value in a dictionary

I am trying to parse data with RE. The data I have to parse are:

  "comments":
{

[
{ "id" : "001",
  "x" : "2",
  "name" : "Chuck"
} ,
{ "id" : "009",
  "x" : "7",
  "name" : "Chuck"
} 
]

}

Using urllib I copy the text above into a string, but I don't want all the text. I just want this:

[
{ "id" : "001",
  "x" : "2",
  "name" : "Chuck"
} ,
{ "id" : "009",
  "x" : "7",
  "name" : "Chuck"
}
]

I have tried using regular expressions but I think I am doing something wrong. My regular expression is:

y = re.findall("([.])", html)

I interpret it as finding all characters between [ and ] and saving it in y.

Upvotes: 0

Views: 93

Answers (2)

TigerhawkT3
TigerhawkT3

Reputation: 49318

You'll need to escape the brackets with a backslash, and note the . as repeating (but not including the closing bracket) with ?. Use the re.DOTALL flag to make . include newlines. You can then send this string to ast.literal_eval() to evaluate it:

import re
import ast
s = '''  "comments":
{

[
{ "id" : "001",
  "x" : "2",
  "name" : "Chuck"
} ,
{ "id" : "009",
  "x" : "7",
  "name" : "Chuck"
} 
]

}'''

Result:

>>> ast.literal_eval(re.search(r'\[.*?\]', s, re.DOTALL).group(0))
[{'name': 'Chuck', 'x': '2', 'id': '001'}, {'name': 'Chuck', 'x': '7', 'id': '009'}]

Upvotes: 1

baldr
baldr

Reputation: 2999

  • One way: Add braces around the text and parse as JSON
  • Another way: regex \[[^\]]+\]
  • Third way: extract by hands x[x.find('['):x.find(']')]

Upvotes: 2

Related Questions