jessirocha
jessirocha

Reputation: 537

Separate a column that contains a list of dictionaries into a new dataframe keeping the old index

I have a dataframe that after applying some filters looks like this:

index  A   ....  J  
55     7   .... [{'sqlStatement': 'DELETE FROM Z WHERE D=2000', 'number': 200, 'time':3556, 'timestamp': 'Jun 13, 2017 5:41:22 PM' }, {'sqlStatement': 'DELETE FROM U WHERE Z=100', 'number': 450, 'time':8906, 'timestamp': 'Jun 13, 2017 5:49:22 PM'}, {'sqlStatement': 'DELETE FROM U WHERE Z=150', 'number': 270, 'time':9806, 'timestamp': 'Jun 13, 2017 5:58:45 PM'}]
193    7   .... [{'sqlStatement': 'DELETE FROM T WHERE F=98', 'number': 8043, 'time':463465, 'timestamp': 'Jun 13, 2017 6:01:22 PM' }, {'sqlStatement': 'DELETE FROM F WHERE A=98 AND Z=100 ', 'number': 9890, 'time':487569, 'timestamp': 'Jun 13, 2017 6:09:28 PM'}]

I need to separate the column J into a new dataframe. For this, I use the following code:

for i, (k, v) in enumerate (df['J'].items()):
    df = pd.DataFrame(v)

And I obtain:

index  sqlStatement                       number  time    timestamp
1     DELETE FROM Z WHERE D=2000          200     3556    Jun 13, 2017 5:41:22 PM
2     DELETE FROM U WHERE Z=100           450     8906    Jun 13, 2017 5:41:22 PM
3     DELETE FROM U WHERE Z=150           270     9806    Jun 13, 2017 5:58:45 PM
4     DELETE FROM T WHERE F=98            8043    463465  Jun 13, 2017 6:01:22 PM
5     DELETE FROM T WHERE F=98 AND Z=100  9890    487569  Jun 13, 2017 6:09:28 PM

The problem is that I would like to add a column containing the index of the observation that generated these new ones. What I would like to achieve is:

  index   sqlStatement                        number   time    timestamp               old_index
    1     DELETE FROM Z WHERE D=2000           200     3556    Jun 13, 2017 5:41:22 PM  55
    2     DELETE FROM U WHERE Z=100            450     8906    Jun 13, 2017 5:41:22 PM  55
    3     DELETE FROM U WHERE Z=150            270     9806    Jun 13, 2017 5:58:45 PM  55
    4     DELETE FROM T WHERE F=98             8043    463465  Jun 13, 2017 6:01:22 PM  193
    5     DELETE FROM T WHERE F=98 AND Z=100   9890    487569  Jun 13, 2017 6:09:28 PM  193

Can you please help me?

Upvotes: 0

Views: 43

Answers (1)

Trenton McKinney
Trenton McKinney

Reputation: 62533

No loops plz:

Data:

j = [[{'number': 200,
       'sqlStatement': 'DELETE FROM Z WHERE D=2000',
       'time': 3556,
       'timestamp': 'Jun 13, 2017 5:41:22 PM'},
      {'number': 450,
       'sqlStatement': 'DELETE FROM U WHERE Z=100',
       'time': 8906,
       'timestamp': 'Jun 13, 2017 5:49:22 PM'},
      {'number': 270,
       'sqlStatement': 'DELETE FROM U WHERE Z=150',
       'time': 9806,
       'timestamp': 'Jun 13, 2017 5:58:45 PM'}],
     [{'number': 8043,
       'sqlStatement': 'DELETE FROM T WHERE F=98',
       'time': 463465,
       'timestamp': 'Jun 13, 2017 6:01:22 PM'},
      {'number': 9890,
       'sqlStatement': 'DELETE FROM F WHERE A=98 AND Z=100 ',
       'time': 487569,
       'timestamp': 'Jun 13, 2017 6:09:28 PM'}]]

Code:

df = pd.DataFrame({'J': j})

enter image description here

Explode the lists to separate lines:

df_explode = df.explode('J')

enter image description here

pd.Series to expand the dicts:

df_explode = df_explode.J.apply(pd.Series)
df_explode.reset_index(inplace=True)
df_explode.rename(columns={'index': 'old_index'})

enter image description here

  • The original indices are maintained throughout.
  • Forego reset_index and df_explode.index will be the original index.

Upvotes: 1

Related Questions