Harvey Koh
Harvey Koh

Reputation: 103

List of List of Dictionary to MultiIndex Dataframe with Values

I have a list of list of dictionary that I would like to be transformed into a dataframe but I can't seem to make my code work.

Currently, this is my example list

eglist = [[{'var1': False, 'name_id': 23, 'team': 0 , 'order':0},
           {'var1': False, 'name_id': 24, 'team': 0 , 'order':1}, 
           {'var1': False, 'name_id': 1, 'team': 0 , 'order':2}],
          [{'var1': False, 'name_id': 23, 'team': 0 , 'order':0},
           {'var1': False, 'name_id': 24, 'team': 0 , 'order':1}, 
           {'var1': False, 'name_id': 1, 'team': 0 , 'order':2}]]

The larger list has around 9000 lists of dictionary.

And I would like it to be transformed into

dfframe =  
      Order 0               Order 1                        Order 2 
0     var1  name_id  team   var1     name_id    team       var1   name_id    team
      False     23      0   False       24        0        False     1        0

1     var1  name_id  team   var1     name_id    team       var1   name_id    team`
      False     23      0   False       24        0        False     1        0

Any help would be great! I am a beginner and hence a little unsure on how to proceed.

This is a repost of a deleted post as I posted unrelated code in that post.

Upvotes: 2

Views: 228

Answers (1)

jpp
jpp

Reputation: 164623

You can follow these steps:

  1. Use collections.defaultdict to aggregate data by order key.
  2. Convert to a single dataframe using pd.concat over axis=1.
  3. Add level to columns via pd.MultiIndex.

Here's a demo:

from collections import defaultdict

# aggregate data by order
d = defaultdict(list)
for L in eglist:
    for row in L:
        d[row['order']].append([row['var1'], row['name_id'], row['team']])

# convert to dataframe
df = pd.concat([pd.DataFrame(v, columns=[f'{k}_var1', f'{k}_name_id', f'{k}_team'])
                for k, v in d.items()], axis=1)

# convert columns to MultiIndex
headers = [('Order '+col.split('_')[0], col.split('_')[-1]) for col in df.columns]
df.columns = pd.MultiIndex.from_tuples(headers, names=['Team', 'Data'])

print(df)

Team Order 0          Order 1          Order 2        
Data    var1  id team    var1  id team    var1 id team
0      False  23    0   False  24    0   False  1    0
1      False  23    0   False  24    0   False  1    0

Upvotes: 2

Related Questions