Pandas: Convert list of lists to multiple columns

Question

I'm new to python and pandas and I would like to convert list of lists (which contains information I extracted from a bunch of files) to individual columns. I have checked quite a lot of posts on stackoverflow and haven't found anything working for me so far. However if you have come across anything similar please post the link in the comments.

I have a Dataframe like this (a representative example):

df:
        ID           Values_a                    

    0   1      [[1,20.1],[2,20.2]]         

    1   7      [[1,30.1],[2,30.2]]

Both lists ([[1,20.1],[2,20.2]] and [[1,30.1],[2,30.2]]) have the same length (and will always be) however the integer in the lists (1 and 2) in can be any numbers.

And I would like to convert df into a dataframe like this:

  Label     1(Number of the 1st ID)        7(Number of the 2nd ID)

    1        20.1                                30.1

    2        20.2                                30.2

Where there will be three columns:

The first column (Label) contains the first number in of the lists (so in this case, we have interger 1 and 2).
The second column (1) has the first ID number as column title, and contains the second values of each lists (20.1, 20.2).
The third column contains the same information for ID number 7.

First, I used apply.(pd.Series) to split the list of lists to get something like this (which I call df2):

df2:
       ID         0                1    

    0  1       [1,20.1]         [2,20.2]       

    1  7       [1,30.1]         [2,30.2]

I though, I can use the same trick (apply.(pd.Series)) to split the columns again to get something like this:

   ID         0        1        2         3

0  1          1       20.1      2        20.2      

1  7          1       30.1      2        30.2

And then, figure out how to get from here to where I want to me.

I have written something like this to split the list again:

names = [x for x in df2.colmuns]

for name in names:
   df3 = df2[name].apply(pd.Series)
   print df3

In jupyter notebook, I get the following result (when I include print df3 in the for loop to check the output):

      0     1
0    1.0   20.1
1    2.0   20.2
      0     1
0    1.0   30.1
1    2.0   30.2

If I do df3.info() in the for loop it tells me that I have two dataframes in df3. (Is this normal???)

If I call df3, this is what I get:

      0     1
0    1.0   30.1
1    2.0   30.2

It seems like I'm overwriting df3 rather than append new data to df3.

So:

How can I get around this problem? (maybe create a new dataframe and append the split columns the new dataframe?)
How can I transform df3 to the DataFrame I want? I have a feeling that I need to reshape my dataframe however I'm not sure how to do so.

Any advice and suggestions will be greatly appreciated..!!

Mr Tarsa · Accepted Answer

Based on the structure of data in the column Values_a here is a possible workaround

>> x = pd.DataFrame({'ID': [1, 7],
>>                   'Values_a': [ [[1, 20.1], [2, 20.2]], 
>>                                 [[1, 30.1], [2, 30.2]] ] });
>> data = { ID: [v[1] for v in x.loc[x['ID'] == ID, 'Values_a'].values[0]]
>>          for ID in x['ID'] }
>> index = [v[0] for v in x['Values_a'].iloc[0]]
>> y = pd.DataFrame(data, index=index)
      1     7
1  20.1  30.1
2  20.2  30.2

Though, I believe there exist a more simple and elegant solution with groupby.

Pandas: Convert list of lists to multiple columns

Answers (1)

Related Questions