Jason
Jason

Reputation: 482

parsing a list of dictionaries, nested lists and tuples to produce a new list

I have the following list that I would like to convert into a pandas dataframe:

data = [
            {'ts_raw_c:TSLA': [
                {'ticker': 'TSLA', 'type': 'close'}, [
                    (1546405200000, 62.024),
                    (1546491600000, 60.072)
                    ]
                ]
            },
            {'ts_raw_h:TSLA': [
                {'ticker': 'TSLA', 'type': 'high'}, [
                    (1546405200000, 63.026), 
                    (1546491600000, 61.88)
                    ]
                ]
            },
            {'ts_raw_l:TSLA': [
                {'ticker': 'TSLA', 'type': 'low'}, [
                    (1546405200000, 59.76), 
                    (1546491600000, 59.476)
                    ]
                ]
            },
            {'ts_raw_o:TSLA': [
                {'ticker': 'TSLA', 'type': 'open'}, [
                    (1546405200000, 61.22), 
                    (1546491600000, 61.4)
                    ]
                ]
            }
        ]

desired dataframe output

                close   high    low     open
1546405200000   62.024  63.026  59.76   61.22
1546491600000   60.07   61.88   59.476  61.4

I think the appropriate way to create the dataframe is like so:

df = pandas.DataFrame(df_column_values, index=df_index, columns=df_column_names)

To that end, the following code is able to create df_index and df_column_names properly, but I'm having a block though, wrapping my head around the code that I need to parse through each nested list of dictionaries and their list of tuples to piece together df_column_values.

My attempts always seem to produce results that circle back to nested lists that are as wide as the number of indexes, not as wide as the number of columns.

# so.py
df_index = []
df_column_names = []
df_column_values = []

all_price_values_per_price_label = {}
all_price_labels = []
for line in data:
    for key_name in line.keys():
        price_label = line[key_name][0]['type']
        df_column_names.append(price_label)

        all_price_values_per_price_label[price_label] = []

        for items in line[key_name][1]:
            df_index.append(items[0]) if items[0] not in df_index else None # timestamp
            all_price_values_per_price_label[price_label].append(items[1])

for price_label in all_price_values_per_price_label:
    all_price_labels.append(price_label)

for price_label in all_price_values_per_price_label:
    df_column_values.append(all_price_values_per_price_label[price_label])

print(df_index)
print(df_column_names)
print(df_column_values)

# python3 so.py
[1546405200000, 1546491600000]
['close', 'high', 'low', 'open']
[[62.024, 60.072], [63.026, 61.88], [59.76, 59.476], [61.22, 61.4]]

df_column_values would need to look like so to be valid:

df_column_values = [[62.024, 63.026, 59.76, 61.22], [60.072, 61.88, 59.476, 61.4]]

Upvotes: 0

Views: 105

Answers (1)

XxJames07-
XxJames07-

Reputation: 1826

you could use dictionary comprehension:

from pandas import DataFrame
df = DataFrame(dict(zip(df_column_names,df_column_values)),index=df_index)
print(df)

Output:

                close    high     low   open
1546405200000  62.024  63.026  59.760  61.22
1546491600000  60.072  61.880  59.476  61.40

Upvotes: 1

Related Questions