Reputation: 6888
I have the following dataframe:
df = pd.DataFrame(
data=[
['X', '1'],[float('nan'),'2'],[float('nan'),'3'],
['Y', '4'],[float('nan'),'5'],[float('nan'),'6']])
0 1
0 X 1
1 NaN 2
2 NaN 3
3 Y 4
4 NaN 5
5 NaN 6
How can I transform that dataframe to flatten the second column into a list/array for each new value in column 0?
After the transform it should look like that:
0 1
0 X [1,2,3]
3 Y [4,5,6]
Preserving of the index isn't important. Since I'm pandas beginner I have a hard time to solve it without iterating over it with a for loop.
Upvotes: 2
Views: 225
Reputation: 164643
You can use pd.Series.ffill
before GroupBy
+ apply
with list
:
df[0] = df[0].ffill()
res = df.groupby(0)[1].apply(list).reset_index()
print(res)
0 1
0 X [1, 2, 3]
1 Y [4, 5, 6]
In general, such a data structure is not recommended as a series of lists removes the ability to perform vectorised operations. The dtype of your series of lists will be object
, which can contain pointers to arbitrary types.
Upvotes: 2