Reputation: 531
I'm parsing through a json file produced from a web crawl and need to extract text in Spanish only; the text is in both English and Spanish. The json is a list of dictionaries. I need to extract the value from the key 'humanLanguage' where the value is 'es'.
Currently my code for extracting all the text is:
url = urllib2.urlopen('https://website_data.json')
obj = json.load(url)
text = [li['text'] for li in obj]
Since 'humanLanguage' is a key on the same level as 'text' I tried this as a first pass to isolate the value:
for value1 in obj[0]['humanLanguage']:
print value1
but this prints out "en" vertically. At least I know this is a way to find the tag and identify either English or Spanish, but I don't know why it's printing it vertically and also don't know how to fix that.
What I want to do is have an "if" statement that says if 'humanLanguage' == 'es', then print the text. But I keep failing to find the right way to write this expression.
Am I on the right track here? Is an 'if' statement the way to achieve this and if so what is expression I should construct? Or is there a better way?
Upvotes: 0
Views: 34
Reputation: 455
I'm assuming your data looks something like:
{
{"humanLanguage": "en", "text": "Some english text 1"},
{"humanLanguage": "es", "text": "Some spanish text 1"},
{"humanLanguage": "en", "text": "Some english text 2"},
{"humanLanguage": "es", "text": "Some spanish text 2"},
... etc ...
}
If you want to get a list of all the text
fields, but only if the corresponding humanLanguage
field == es
, try this:
text = [li['text'] for li in obj if li['humanLanguage'] == 'es']
Then you can print them all out like:
for t in text:
print(t)
Upvotes: 1