topplethepat
topplethepat

Reputation: 531

Syntax for "if" statement when parsing list of dictionaries to extract value

I'm parsing through a json file produced from a web crawl and need to extract text in Spanish only; the text is in both English and Spanish. The json is a list of dictionaries. I need to extract the value from the key 'humanLanguage' where the value is 'es'.

Currently my code for extracting all the text is:

    url = urllib2.urlopen('https://website_data.json')
    obj = json.load(url)
    text = [li['text'] for li in obj]

Since 'humanLanguage' is a key on the same level as 'text' I tried this as a first pass to isolate the value:

    for value1 in obj[0]['humanLanguage']:
        print value1

but this prints out "en" vertically. At least I know this is a way to find the tag and identify either English or Spanish, but I don't know why it's printing it vertically and also don't know how to fix that.

What I want to do is have an "if" statement that says if 'humanLanguage' == 'es', then print the text. But I keep failing to find the right way to write this expression.

Am I on the right track here? Is an 'if' statement the way to achieve this and if so what is expression I should construct? Or is there a better way?

Upvotes: 0

Views: 34

Answers (1)

almiki
almiki

Reputation: 455

I'm assuming your data looks something like:

{
  {"humanLanguage": "en", "text": "Some english text 1"},
  {"humanLanguage": "es", "text": "Some spanish text 1"},
  {"humanLanguage": "en", "text": "Some english text 2"},
  {"humanLanguage": "es", "text": "Some spanish text 2"},
  ... etc ...
}

If you want to get a list of all the text fields, but only if the corresponding humanLanguage field == es, try this:

text = [li['text'] for li in obj if li['humanLanguage'] == 'es']

Then you can print them all out like:

for t in text:
    print(t)

Upvotes: 1

Related Questions