Reputation: 129
I've got a pandas dataframe in which one of the columns contains lists with different lengths. The solutions to explode lists in pandas all assume that the lists to be exploded are all the same length.
This is my df:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]
And this what I am looking for:
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985
Has anybody got any suggestions?
Upvotes: 8
Views: 4238
Reputation: 13349
Pandas >=0.25
df:
Name Data
0 Bar [Product, Item, X]
1 Foo [Product, Misc]
Using explode:
df = df.explode('Data')
df:
Name Data
0 Bar Product
0 Bar Item
0 Bar X
1 Foo Product
1 Foo Misc
Upvotes: 6
Reputation: 294218
Very similar to @coldspeed. I took a few different steps.
s = df['Shared Codes']
i = np.arange(len(df)).repeat(s.str.len())
df.iloc[i, :-1].assign(**{'Shared Codes': np.concatenate(s.values)})
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
1 21:23 20:55 LX 818 LH 5809
2 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
4 21:28 21:00 LX 770 SN 5102
7 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
7 21:31 21:10 LX 1842 TP 8184
7 21:31 21:10 LX 1842 A3 1985
Upvotes: 6
Reputation: 323226
Ok, I will post it again for more information and other genius solutions please check link1 and link2
df.set_index(['Dep','Exp','Fl-No'])['Shared Codes'].apply(pd.Series).stack().reset_index().drop('level_3',1)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985
Also, using pd.wide_to_long
However, personally do not recommended cause overkill.
df1=df['Shared Codes'].apply(pd.Series)
df1.columns='sur'+df1.columns.astype(str)
df=pd.concat([df,df1],axis=1)
pd.wide_to_long(df,['sur'],['Dep','Exp','Fl-No'],'lol').reset_index().drop(['lol','Shared Codes'],axis=1).dropna()
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985
Upvotes: 2
Reputation: 402333
One possibility using np.repeat
and np.hstack
:
print(df)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 [No shared codes]
1 21:23 20:55 LX 818 [Dummy, LH 5809]
2 21:27 21:00 JU 375 [No shared codes]
4 21:28 21:00 LX 770 [Dummy, SN 5102]
7 21:31 21:10 LX 1842 [Dummy, LH 5880, TP 8184, A3 1985]
x = df.iloc[:, :-1].values.repeat(df['Shared Codes'].apply(len), 0)
y = df['Shared Codes'].apply(pd.Series).stack().values.reshape(-1, 1)
out = pd.DataFrame(np.hstack((x, y)), columns=df.columns)
print(out)
Dep Exp Fl-No Shared Codes
0 20:58 20:55 LX 736 No shared codes
1 21:23 20:55 LX 818 Dummy
2 21:23 20:55 LX 818 LH 5809
3 21:27 21:00 JU 375 No shared codes
4 21:28 21:00 LX 770 Dummy
5 21:28 21:00 LX 770 SN 5102
6 21:31 21:10 LX 1842 Dummy
7 21:31 21:10 LX 1842 LH 5880
8 21:31 21:10 LX 1842 TP 8184
9 21:31 21:10 LX 1842 A3 1985
Upvotes: 4