BlueM
BlueM

Reputation: 6888

How to flatten a column in a dataframe

I have the following dataframe:

df = pd.DataFrame(
    data=[
        ['X', '1'],[float('nan'),'2'],[float('nan'),'3'],
        ['Y', '4'],[float('nan'),'5'],[float('nan'),'6']])

     0  1
0    X  1
1  NaN  2
2  NaN  3
3    Y  4
4  NaN  5
5  NaN  6

How can I transform that dataframe to flatten the second column into a list/array for each new value in column 0?

After the transform it should look like that:

     0  1
0    X  [1,2,3]
3    Y  [4,5,6]

Preserving of the index isn't important. Since I'm pandas beginner I have a hard time to solve it without iterating over it with a for loop.

Upvotes: 2

Views: 225

Answers (1)

jpp
jpp

Reputation: 164643

You can use pd.Series.ffill before GroupBy + apply with list:

df[0] = df[0].ffill()
res = df.groupby(0)[1].apply(list).reset_index()

print(res)

   0          1
0  X  [1, 2, 3]
1  Y  [4, 5, 6]

In general, such a data structure is not recommended as a series of lists removes the ability to perform vectorised operations. The dtype of your series of lists will be object, which can contain pointers to arbitrary types.

Upvotes: 2

Related Questions