zeroes_ones
zeroes_ones

Reputation: 191

Explode nested list of dictionaries into Pandas columns

I have Pandas dataframe created from a list of dictionaries that looks as follows:

import pandas as pd

raw_data = [{'type': 'message',
  'subtype': 'bot',
  'text': 'Direction: North\nDestination: New York\nNew Customer: true\n',
  'timestamp': '1650907136.554649',
  'bot_id': 'ABC'},
 {'type': 'message',
  'subtype': 'bot',
  'text': 'Direction: South\nDestination: Miami\nNew Customer: false\n',
  'timestamp': '1650907098.340029',
  'bot_id': 'DEF'},
 {'type': 'message',
  'subtype': 'bot',
  'text': 'Direction: East\nDestination: Los Angeles\nNew Customer: true\n',
  'timestamp': '1650906935.705729',
  'bot_id': 'XYZ'}]

df = pd.DataFrame(raw_data)

df

    type    subtype text                                                timestamp           bot_id
0   message bot     Direction: North\nDestination: New York\nNew C...   1650907136.554649   ABC
1   message bot     Direction: South\nDestination: Miami\nNew Cust...   1650907098.340029   DEF
2   message bot     Direction: East\nDestination: Los Angeles\nNew...   1650906935.705729   XYZ

I use the following code to parse the text column and create a Pandas dataframe consisting of the following:

df[['direction', 'destination', 'new_customer', 'foo']] = df['text'].str.split(pat='\n', expand=True)

    type    subtype text                                                timestamp           bot_id         direction           destination                  new_customer            foo
0   message bot     Direction: North\nDestination: New York\nNew C...   1650907136.554649   ABC            Direction: North    Destination: New York        New Customer: true  
1   message bot     Direction: South\nDestination: Miami\nNew Cust...   1650907098.340029   DEF            Direction: South    Destination: Miami           New Customer: false 
2   message bot     Direction: East\nDestination: Los Angeles\nNew...   1650906935.705729   XYZ            Direction: East     Destination: Los Angeles     New Customer: true  

I then strip some unnecessary detail from the columns:

df['direction'] = df['direction'].str.replace('Direction: ', '')
df['destination'] = df['destination'].str.replace('Destination: ', '')
df['new_customer'] = df['new_customer'].str.replace('New Customer: ', '')

This gives:

type  subtype   text                                                    timestamp           bot_id       direction     destination       new_customer   foo
0     message   bot Direction: North\nDestination: New York\nNew C...   1650907136.554649   ABC          North         New York          true   
1     message   bot Direction: South\nDestination: Miami\nNew Cust...   1650907098.340029   DEF          South         Miami             false  
2     message   bot Direction: East\nDestination: Los Angeles\nNew...   1650906935.705729   XYZ          East          Los Angeles       true   

So far, so good. Now, I have the following (more complicated) list of nested dictionaries:

raw_data_2 = [{'type': 'message',
  'subtype': 'bot_message',
  'text': "This content can't be displayed.",
  'timestamp': '1650905606.755969',
  'username': 'admin',
  'bot_id': 'BPD4K3SJW',
  'blocks': [{'type': 'section',
    'block_id': 'BJNTn',
    'text': {'type': 'mrkdwn',
     'text': 'You have a new message.',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'WPn/l',
    'text': {'type': 'mrkdwn',
     'text': '*Heard By*\nFriend',
     'verbatim': False}},
   {'type': 'section',
    'block_id': '5yp',
    'text': {'type': 'mrkdwn',
     'text': '*Which Direction? *\nNorth',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'fKEpF',
    'text': {'type': 'mrkdwn',
     'text': '*Which Destination*\nNew York',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'qjAH',
    'text': {'type': 'mrkdwn',
     'text': '*New Customer:*\Yes',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'IBr',
    'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]},
 {'type': 'message',
  'subtype': 'bot_message',
  'text': "This content can't be displayed.",
  'timestamp': '1650899428.077709',
  'username': 'admin',
  'bot_id': 'BPD4K3SJW',
  'blocks': [{'type': 'section',
    'block_id': 'Smd',
    'text': {'type': 'mrkdwn',
     'text': 'You have a new message.',
     'verbatim': False}},
   {'type': 'section',
    'block_id': '6YaLt',
    'text': {'type': 'mrkdwn',
     'text': '*Heard By*\nOnline Search',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'w3o',
    'text': {'type': 'mrkdwn',
     'text': '*Which Direction: *\nNorth',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'PTQ',
    'text': {'type': 'mrkdwn',
     'text': '*Which Destination? *\nMiami',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'JCfSP',
    'text': {'type': 'mrkdwn',
     'text': '*New Customer? *\nNo',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'RJOA',
    'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]}]

I put this into a dataframe df2 as follows:

df2 = pd.DataFrame(raw_data_2)

    type    subtype     text                                timestamp           username    bot_id  blocks
0   message bot_message This content can't be displayed.    1650905606.755969   admin   BPD4K3SJW   [{'type': 'section', 'block_id': 'BJNTn', 'tex...
1   message bot_message This content can't be displayed.    1650899428.077709   admin   BPD4K3SJW   [{'type': 'section', 'block_id': 'Smd', 'text'...

How would I parse the blocks column of df2 to achieve the following dataframe?

type     subtype      text                             timestamp          username  bot_id     heard_by   direction   destination   new_customer
message  bot_message  This content can't be displayed  1650905606.755969  admin     BPD4K3SJW  Friend      North      New York      Yes
message  bot_message  This content can't be displayed  1650899428.077709  admin     BPD4K3SJW  Online      North      Miami         No

Thanks!

EDIT

The solution provided by @onyambu works perfectly on raw_data_2, but not on the following raw_data_3 (which includes another block):

raw_data_3 = [{'type': 'message',
  'subtype': 'bot_message',
  'text': "This content can't be displayed.",
  'timestamp': '1650905606.755969',
  'username': 'admin',
  'bot_id': 'BPD4K3SJW',
  'blocks': [{'type': 'section',
    'block_id': 'BJNTn',
    'text': {'type': 'mrkdwn',
     'text': 'You have a new message.',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'WPn/l',
    'text': {'type': 'mrkdwn',
     'text': '*Heard By*\nFriend',
     'verbatim': False}},
   {'type': 'section',
    'block_id': '5yp',
    'text': {'type': 'mrkdwn',
     'text': '*Which Direction? *\nNorth',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'fKEpF',
    'text': {'type': 'mrkdwn',
     'text': '*Which Destination*\nNew York',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'qjAH',
    'text': {'type': 'mrkdwn',
     'text': '*New Customer:*\Yes',
     'verbatim': False}},
             
   {'type': 'actions',
    'block_id': 'yt4',
    'elements': [{'type': 'button',
    'action_id': '+bc',
    'text': {'type': 'plain_text', 'text': 'View results', 'emoji': True},
    'url': 'www.example.com'}]},
             
   {'type': 'section',
    'block_id': 'IBr',
    'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]},
 {'type': 'message',
  'subtype': 'bot_message',
  'text': "This content can't be displayed.",
  'timestamp': '1650899428.077709',
  'username': 'admin',
  'bot_id': 'BPD4K3SJW',
  'blocks': [{'type': 'section',
    'block_id': 'Smd',
    'text': {'type': 'mrkdwn',
     'text': 'You have a new message.',
     'verbatim': False}},
   {'type': 'section',
    'block_id': '6YaLt',
    'text': {'type': 'mrkdwn',
     'text': '*Heard By*\nOnline Search',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'w3o',
    'text': {'type': 'mrkdwn',
     'text': '*Which Direction: *\nNorth',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'PTQ',
    'text': {'type': 'mrkdwn',
     'text': '*Which Destination? *\nMiami',
     'verbatim': False}},
   {'type': 'section',
    'block_id': 'JCfSP',
    'text': {'type': 'mrkdwn',
     'text': '*New Customer? *\nNo',
     'verbatim': False}},

    {'type': 'actions',
    'block_id': 'yt4',
    'elements': [{'type': 'button',
      'action_id': '+bc',
      'text': {'type': 'plain_text', 'text': 'View results', 'emoji': True},
      'url': 'www.example.com'}]},
             
             
   {'type': 'section',
    'block_id': 'RJOA',
    'text': {'type': 'mrkdwn', 'text': ' ', 'verbatim': False}}]}]

It's throwing a key error:

KeyError: 'text'

Upvotes: 1

Views: 513

Answers (1)

Onyambu
Onyambu

Reputation: 79208

import re

d_new = (pd.DataFrame([[re.sub(".*[*]\\W+", "", val['text']['text']) 
               for val in dat['blocks']] for dat in raw_data_2]).
          drop([0, 5], axis = 1))

d_new.columns = ['heard_by', 'direction','destination', 'new_customer']

d_new
 
        heard_by direction destination new_customer
0         Friend     North    New York          Yes
1  Online Search     North       Miami           No

You can then append this to your original data

Upvotes: 1

Related Questions