nat5142
nat5142

Reputation: 495

Iterate over nested lists of dictionaries using list comprehension

I have a bunch of text files containing meteorological data. Each text file stores a half-hour worth of data, which is 18000 observations (lines). There are 48 files in total (a full day), and I've stored all of the data in the following structure:

# all_data is a list of dictionaries, len=48 --> each dict represents one file

all_data = [{'time': 0026,
             'filename': 'file1.txt',
               # all_data['data'] is a list of dictionaries, len=18000
               # each dict in all_data['data'] represents one line of corresponding file
             'data': [{'x': 1.345, 'y': -0.779, 'z': 0.023, 'temp': 298.11},
                      {'x': 1.277, 'y': -0.731, 'z': 0.086, 'temp': 297.88},
                      ...,
                      {'x': 2.119, 'y': 1.332, 'z': -0.009, 'temp': 299.14}]
             },

             {'time': 0056,
              'filename': 'file2.txt',
              'data': [{'x': 1.216, 'y': -0648, 'z': 0.881, 'temp': 301.11},
                      {'x': 0.866, 'y': 0.001, 'z': 0.031, 'temp': 301.32},
                      ...,
                      {'x': 0.181, 'y': 0.498, 'z': 0.101, 'temp': 300.91}]
             },
             ...
             ]

Now I need to unpack it. I need to create a list of all values of x (all_data[i]['data'][j]['x']) in sequential order to use for plotting. Fortunately, the data is already stored in sequential order.

I know that I can simply do something like this to achieve my goal:

x_list = []
for dictionary in all_data:
    for record in dictionary['data']: # loop over list of dictionaries
         x_list.append(record['x'])

But I have to do something similar for many variables that I did not list here for simplicity's sake, and I really don't want to have to rewrite this loop 20 times nor hand-create 20 new lists.

Is there a way to iterate over a nested data structure like this using list comprehension?

I threw up a prayer and tried:

[x for x in all_data[i for i in len(all_data)]['data'][j for j in len(all_data[i]['data'])]

which of course didn't work. Any ideas?

Here's my desired output, which is just the values of 'x' in nested list 'data':

all_x = [1.345, 1.277, ..., 2.119, 1.216, 0.866, ..., 0.181, ...]

Thanks in advance!

Upvotes: 0

Views: 1706

Answers (4)

hunzter
hunzter

Reputation: 598

If I understand you correctly, you want an output is:

  1. a list
  2. each element is a sublist which is value of variable x -> z, temp

not just list values of x.

Then this is your code:

values = [row.values() for day in all_data for row in day['data']]

With each item in values is a list of values of variable from x -> z/temp, or a matrix of vector value.

For your above sample data, the output is:

[[-0.779, 1.345, 0.023, 298.11], [-0.731, 1.277, 0.086, 297.88], [1.332, 2.119, -0.009, 299.14], [-0.648, 1.216, 0.881, 301.11], [0.001, 0.866, 0.031, 301.32], [0.498, 0.181, 0.101, 300.91]]

corresponding to ['x', 'y', 'z', 'temp'] variables.

EDIT: if you want to extracts values for one variable, use numpy, convert the the output to array and extract the corresponding column.

Upvotes: 0

emunsing
emunsing

Reputation: 9944

If you don't mind using Pandas, this can be a great way of accomplishing what you want. Running dataDfList = [pandas.DataFrame(f['data']) for f in all_data] Will generate a list of DataFrames, each looking like: | | temp | x | y | z | |------|--------|-------|--------|--------| | 0 | 298.11 | 1.345 | -0.779 | 0.023 | | 1 | 297.88 | 1.277 | -0.731 | 0.086 | | 2 | 299.14 | 2.119 | 1.332 | -0.009 | Each of these can then be easily plotted. You could also accomplish this with a MultiIndex, e.g. by stacking the list of dataframes using pandas.concat(dataDfList)

Upvotes: 1

Transhuman
Transhuman

Reputation: 3547

from itertools import chain
[ k['x'] for k in chain.from_iterable([ i['data'] for i in all_data ]) ]

Upvotes: 2

Ajax1234
Ajax1234

Reputation: 71451

You can try this:

import itertools
all_data = [{'time': 0026, 'filename': 'file1.txt', 'data': [{'x': 1.345, 'y': -0.779, 'z': 0.023, 'temp': 298.11}, {'x': 1.277, 'y': -0.731, 'z': 0.086, 'temp': 297.88}, {'x': 2.119, 'y': 1.332, 'z': -0.009, 'temp': 299.14}]},
        {'time': 0056, 'filename': 'file2.txt','data': [{'x': 1.216, 'y': -648, 'z': 0.881, 'temp': 301.11}, {'x': 0.866, 'y': 0.001, 'z': 0.031, 'temp': 301.32},{'x': 0.181, 'y': 0.498, 'z': 0.101, 'temp': 300.91}]}]

x_data = list(itertools.chain.from_iterable([[b["x"] for b in i["data"]] for i in all_data]))
print(x_data)

Output:

[1.345, 1.277, 2.119, 1.216, 0.866, 0.181]

Upvotes: 1

Related Questions