haneulkim
haneulkim

Reputation: 4928

Efficient way of looping through list of dictionaries and appending items into column in dataframe

Here is MRE:

data = [
    {'1':20},
    {'1':10},
    {'1':40},
    {'1':14},
    {'1':33}
]

What I am trying to do is loop through each dictionary and append each value to a column in a dataframe.

right now I am doing

import pandas as pd
lst = []
for item in data:
    lst.append(item['1'])

df = pd.DataFrame({"col1":lst})

outputting:

    col1
0   20
1   10
2   40
3   14
4   33

Yes this is what I want however I have over 1M dictionaries in a list. Is it most efficient way?

EDIT: pd.DataFrame(data).rename(columns={'1':'col1'}) works perfectly for above case however what if data looks like this?

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

so I would use:

lst = []
for item in data:
    lst.append(item['1']['value'])

df = pd.DataFrame({"col1":lst})

is there more efficient way for list of dictionary that contain dictionary?

Upvotes: 9

Views: 779

Answers (2)

jezrael
jezrael

Reputation: 863291

One idea is pass data to DataFrame cosntructor and then use rename:

df = pd.DataFrame(data).rename(columns={'1':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

If is necessary filtering use list comprehension and add parameter columns:

df = pd.DataFrame([x['1'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

EDIT: For new data use:

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

df = pd.DataFrame([x['1']['value'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

Or:

df = pd.DataFrame([x['1'] for x in data]).rename(columns={'value':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

Upvotes: 3

U13-Forward
U13-Forward

Reputation: 71610

@jezrael's answer is correct but to be more specific with col:

df = pd.DataFrame(data)
print(df.add_prefix('col'))

Output:

   col1
0    20
1    10
2    40
3    14
4    33

Upvotes: 2

Related Questions