Reputation: 173
I have a pandas dataframe where one column is a tuple with a nested tuple. The nested tuple has two existing ids. I want to explode every element in the total tuple into new appended columns. Here's my df so far:
df
id1 id2 tuple_of_tuple
0 a e ('cat',100,('a','f'))
1 b f ('dog',100,('b','g'))
2 c g ('cow',100,('d','h'))
3 d h ('tree',100,('c','e'))
I was trying to implement the code below on a small subset of data, and it seemed to work. There were new appended columns with each extracted/exploded element where it needed to be.
df[['Link_1', 'Link_2','Link_3','Link_4']] = df['tuple_of_tuple'].apply(pd.Series)
But when I apply it on the entire dataset, I get the error "ValueError: Columns must be same length as key". (I should mention that there are a couple NaN's littered around, as in an entire entry in the row for the tuple_of_tuple column will just be NaN). How can I fix this?
Upvotes: 1
Views: 1763
Reputation: 402533
Here's an extremely elegant way to do it using python3.6's *
unpacking operator:
df2 = pd.DataFrame(
data=[[*i, *j] for *i, j in df.pop('tuple_of_tuple')],
columns=['link_1', 'link_2', 'link_3', 'link_4']
)
You can then link df2
with df
using pd.concat
:
pd.concat([df, df2], axis=1)
id1 id2 link_1 link_2 link_3 link_4
0 a e cat 100 a f
1 b f dog 100 b g
2 c g cow 100 d h
3 d h tree 100 c e
Upvotes: 4