Creating pandas dataframe from list of dictionaries containing lists of data

Question

I have a list of dictionaries with this structure.

    {
        'data' : [[year1, value1], [year2, value2], ... m entries],
        'description' : string,
        'end' : string,
        'f' : string,
        'lastHistoricalperiod' : string, 
        'name' : string,
        'series_id' : string,
        'start' : int,
        'units' : string,
        'unitsshort' : string,
        'updated' : string
    }

I want to put this in a pandas DataFrame that looks like

   year       value  updated                   (other dict keys ... )
0  2040  120.592468  2014-05-23T12:06:16-0400  other key-values
1  2039  120.189987  2014-05-23T12:06:16-0400  ...
2  other year-value pairs ...
...
n

where n = m* len(list with dictionaries) (where length of each list in 'data' = m)

That is, each tuple in 'data' should have its own row. What I've done thus far is this:

x = [list of dictionaries as described above]
# Create Empty Data Frame
output = pd.DataFrame()

    # Loop through each dictionary in the list
    for dictionary in x:
        # Create a new DataFrame from the 2-D list alone.
        data = dictionary['data']
        y = pd.DataFrame(data, columns = ['year', 'value'])
        # Loop through all the other dictionary key-value pairs and fill in values
        for key in dictionary:
            if key != 'data':
                y[key] = dictionary[key]
        # Concatenate most recent output with the dframe from this dictionary.
        output = pd.concat([output_frame, y], ignore_index = True)

This seems very hacky, and I was wondering if there's a more 'pythonic' way to do this, or at least if there are any obvious speedups here.

ZJS · Accepted Answer

If Your data is in the form [{},{},...] you can do the following...

The issue with your data is in the data key of your dictionaries.

df = pd.DataFrame(data)
fix = df.groupby(level=0)['data'].apply(lambda x:pd.DataFrame(x.iloc[0],columns = ['Year','Value']))
fix = fix.reset_index(level=1,drop=True)
df = pd.merge(fix,df.drop(['data'],1),how='inner',left_index=True,right_index=True)

The code does the following...

Creates a DataFrame with your list of dictionaries
creates a new dataframe by stretching out your data column into more rows
The stretching line has caused a multiindex with an irrelevant column - this removes it
Finally merge on the original index and get desired DataFrame

Creating pandas dataframe from list of dictionaries containing lists of data

Answers (2)

Related Questions