SiSi
SiSi

Reputation: 67

Pandas: Convert list of lists to multiple columns

I'm new to python and pandas and I would like to convert list of lists (which contains information I extracted from a bunch of files) to individual columns. I have checked quite a lot of posts on stackoverflow and haven't found anything working for me so far. However if you have come across anything similar please post the link in the comments.


I have a Dataframe like this (a representative example):

df:
        ID           Values_a                    

    0   1      [[1,20.1],[2,20.2]]         

    1   7      [[1,30.1],[2,30.2]]    

Both lists ([[1,20.1],[2,20.2]] and [[1,30.1],[2,30.2]]) have the same length (and will always be) however the integer in the lists (1 and 2) in can be any numbers.

And I would like to convert df into a dataframe like this:

  Label     1(Number of the 1st ID)        7(Number of the 2nd ID)

    1        20.1                                30.1

    2        20.2                                30.2

Where there will be three columns:


First, I used apply.(pd.Series) to split the list of lists to get something like this (which I call df2):

df2:
       ID         0                1    

    0  1       [1,20.1]         [2,20.2]       

    1  7       [1,30.1]         [2,30.2]       

I though, I can use the same trick (apply.(pd.Series)) to split the columns again to get something like this:

   ID         0        1        2         3

0  1          1       20.1      2        20.2      

1  7          1       30.1      2        30.2    

And then, figure out how to get from here to where I want to me.

I have written something like this to split the list again:

names = [x for x in df2.colmuns]

for name in names:
   df3 = df2[name].apply(pd.Series)
   print df3

In jupyter notebook, I get the following result (when I include print df3 in the for loop to check the output):

      0     1
0    1.0   20.1
1    2.0   20.2
      0     1
0    1.0   30.1
1    2.0   30.2

If I do df3.info() in the for loop it tells me that I have two dataframes in df3. (Is this normal???)

If I call df3, this is what I get:

      0     1
0    1.0   30.1
1    2.0   30.2

It seems like I'm overwriting df3 rather than append new data to df3.

So:

Any advice and suggestions will be greatly appreciated..!!

Upvotes: 2

Views: 2270

Answers (1)

Mr Tarsa
Mr Tarsa

Reputation: 6652

Based on the structure of data in the column Values_a here is a possible workaround

>> x = pd.DataFrame({'ID': [1, 7],
>>                   'Values_a': [ [[1, 20.1], [2, 20.2]], 
>>                                 [[1, 30.1], [2, 30.2]] ] });
>> data = { ID: [v[1] for v in x.loc[x['ID'] == ID, 'Values_a'].values[0]]
>>          for ID in x['ID'] }
>> index = [v[0] for v in x['Values_a'].iloc[0]]
>> y = pd.DataFrame(data, index=index)
      1     7
1  20.1  30.1
2  20.2  30.2

Though, I believe there exist a more simple and elegant solution with groupby.

Upvotes: 2

Related Questions