Reputation: 85
I have the following this question where one of the columns is an object (list type cell):
I don't want to use explode
(using an older version of pandas). How to do the same for dataframe with three columns?
df
A B C
0 1 [1, 2] 3
1 1 [1, 2] 4
2 2 [3, 4] 5
My expected output is:
A B C
0 1 1 3
1 1 2 3
3 1 1 4
4 1 2 4
5 2 3 5
6 2 4 5
I found these two methods useful. How to add third column to this code.
df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
or
df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
Upvotes: 1
Views: 902
Reputation: 28644
You can use itertools to reshape your data :
from itertools import product,chain
pd.DataFrame(chain.from_iterable((product([a],b,[c]))
for a,b,c in df.to_numpy()),
columns = df.columns)
A B C
0 1 1 3
1 1 2 3
2 1 1 4
3 1 2 4
4 2 1 5
5 2 4 5
Upvotes: 0
Reputation: 59529
You set the index to be all of the columns you want to keep tied to the list you explode:
(df.set_index(['A', 'C'])['B']
.apply(pd.Series).stack()
.reset_index()
.drop(columns='level_2').rename(columns={0: 'B'}))
A C B
0 1 3 1
1 1 3 2
2 1 4 1
3 1 4 2
4 2 5 3
5 2 5 4
Or for the second method also repeat 'C'
pd.DataFrame({'A': df.A.repeat(df.B.str.len()),
'C': df.C.repeat(df.B.str.len()),
'B': np.concatenate(df.B.to_numpy())})
Upvotes: 2