Reputation: 67
I'm new to python and pandas and I would like to convert list of lists (which contains information I extracted from a bunch of files) to individual columns. I have checked quite a lot of posts on stackoverflow and haven't found anything working for me so far. However if you have come across anything similar please post the link in the comments.
I have a Dataframe like this (a representative example):
df:
ID Values_a
0 1 [[1,20.1],[2,20.2]]
1 7 [[1,30.1],[2,30.2]]
Both lists ([[1,20.1],[2,20.2]]
and [[1,30.1],[2,30.2]]
) have the same length (and will always be) however the integer in the lists (1
and 2
) in can be any numbers.
And I would like to convert df
into a dataframe like this:
Label 1(Number of the 1st ID) 7(Number of the 2nd ID)
1 20.1 30.1
2 20.2 30.2
Where there will be three columns:
Label
) contains the first number in of the lists (so in this case, we have interger 1
and 2
).1
) has the first ID number as column title, and contains the second values of each lists (20.1
, 20.2
).First, I used apply.(pd.Series) to split the list of lists to get something like this (which I call df2):
df2:
ID 0 1
0 1 [1,20.1] [2,20.2]
1 7 [1,30.1] [2,30.2]
I though, I can use the same trick (apply.(pd.Series)) to split the columns again to get something like this:
ID 0 1 2 3
0 1 1 20.1 2 20.2
1 7 1 30.1 2 30.2
And then, figure out how to get from here to where I want to me.
I have written something like this to split the list again:
names = [x for x in df2.colmuns]
for name in names:
df3 = df2[name].apply(pd.Series)
print df3
In jupyter notebook, I get the following result (when I include print df3
in the for
loop to check the output):
0 1
0 1.0 20.1
1 2.0 20.2
0 1
0 1.0 30.1
1 2.0 30.2
If I do df3.info()
in the for loop it tells me that I have two dataframes in df3. (Is this normal???)
If I call df3
, this is what I get:
0 1
0 1.0 30.1
1 2.0 30.2
It seems like I'm overwriting df3
rather than append new data to df3
.
So:
How can I get around this problem? (maybe create a new dataframe and append the split columns the new dataframe?)
How can I transform df3 to the DataFrame I want? I have a feeling that I need to reshape my dataframe however I'm not sure how to do so.
Any advice and suggestions will be greatly appreciated..!!
Upvotes: 2
Views: 2270
Reputation: 6652
Based on the structure of data in the column Values_a
here is a possible workaround
>> x = pd.DataFrame({'ID': [1, 7],
>> 'Values_a': [ [[1, 20.1], [2, 20.2]],
>> [[1, 30.1], [2, 30.2]] ] });
>> data = { ID: [v[1] for v in x.loc[x['ID'] == ID, 'Values_a'].values[0]]
>> for ID in x['ID'] }
>> index = [v[0] for v in x['Values_a'].iloc[0]]
>> y = pd.DataFrame(data, index=index)
1 7
1 20.1 30.1
2 20.2 30.2
Though, I believe there exist a more simple and elegant solution with groupby
.
Upvotes: 2