iverkordan
iverkordan

Reputation: 47

extract data from a complicated data structure in python

I hava a structure of data like

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

it is a list contains many dictionaries, each have 3 pairs 'uid': 'test_subject145', 'class':'?', 'data':[]. in the last pair 'data', the value is a list, and it contain again a dictionary which have 2 pairs 'chunk':1, 'writing':[], in the pair 'writing', its value is a list containing again many lists. What I want to extract is the content of all those sentence like 'this is exciting' and 'you are good' etc and put then into a simple list. Its final form should be list_final = ['this is exciting', 'you are good', 'he died',... ]

Upvotes: 3

Views: 1130

Answers (3)

Harvey
Harvey

Reputation: 5821

tl;dr

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Just go slow and do one layer at a time. Then refactor your code to make it smaller.

data = [{'class': '?',
         'data': [{'chunk': 1,
                   'writing': [['this is exciting'], ['you are good']]}],
         'uid': 'test_subject145'},
        {'class': '?',
         'data': [{'chunk': 2,
         'writing': [['he died'], ['go ahead']]}],
         'uid': 'test_subject166'}]

for d in data:
    print(d)
# {'class': '?', 'uid': 'test_subject145', 'data': [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]}
# {'class': '?', 'uid': 'test_subject166', 'data': [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]}

for d in data:
     data_list = d['data']
     print(data_list)
# [{'writing': [['this is exciting'], ['you are good']], 'chunk': 1}]
# [{'writing': [['he died'], ['go ahead']], 'chunk': 2}]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         print(d2)
# {'writing': [['this is exciting'], ['you are good']], 'chunk': 1}
# {'writing': [['he died'], ['go ahead']], 'chunk': 2}

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         print(writing_list)
# [['this is exciting'], ['you are good']]
# [['he died'], ['go ahead']]

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             print(writing_sub_list)
# ['this is exciting']
# ['you are good']
# ['he died']
# ['go ahead']

for d in data:
     data_list = d['data']
     for d2 in data_list:
         writing_list = d2['writing']
         for writing_sub_list in writing_list:
             for str in writing_sub_list:
                  print(str)
# this is exciting
# you are good
# he died
# go ahead

Then to convert to something smaller (but hard to read), rewrite the above code like this. It should be easy to see how to go from one to the other:

strings = [str for d in data for d2 in d['data'] for wsl in d2['writing'] for str in wsl]
# ['this is exciting', 'you are good', 'he died', 'go ahead']

Then, make it pretty with better names like Willem's answer:

[str for dic in data
     for data_dict in dic['data']
     for writing_sub_list in data_dict['writing']
     for str in writing_sub_list]

Upvotes: 2

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477533

Given your original list is named input, simply use list comprehension:

[elem for dic in input
      for dat in dic.get('data',())
      for writing in dat.get('writing',())
      for elem in writing]

You can use .get(..,()) such that if there is no such key, it still works: if there is no such key, we return the empty tuple () so there are no iterations.

Based on your sample input, we get:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

Upvotes: 3

A. N. Other
A. N. Other

Reputation: 407

So I believe the below will work

lista = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
          {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

list_of_final_products = []

for itema in lista:
  try:
    for data_item in itema['data']:
      for writa in data_item['writing']:
        for writa_itema in writa:
          list_of_final_products.append(writa)
  except:
    pass

This item, as referenced above, is I believe helpful in understanding - python getting a list of value from list of dict (thank you to McGrady)

Upvotes: 1

Related Questions