Reputation: 1315
I have a pandas dataframe with the following form:
import pandas as pd
p = pd.DataFrame({"int" : [1, 1, 1, 1, 2, 2],
"cod" : [[1,1], [2,2], [1,2], [3,9], [2,2], [2,2]]})
I want to group by int
, which gives me a bunch of lists. I then want to flatten these lists, so I ultimately end up with a dataframe that has this form:
p = pd.DataFrame({"int" : [1, 2],
"cod" : [[1,1,2,2,1,2,3,9], [2,2,2,2]]})
Here is what I have so far:
p.groupby("int", as_index=False)["cod"]
I'm stuck at how to flatten once I have grouped by int
Upvotes: 1
Views: 1742
Reputation: 863176
Use sum
:
df = p.groupby("int", as_index=False)["cod"].sum()
Or list comprehension
:
df = p.groupby("int")["cod"].apply(lambda x: [z for y in x for z in y]).reset_index()
df = p.groupby("int")["cod"].apply(lambda x: np.concatenate(x.values).tolist()).reset_index()
For performance if large list should be fastest:
from itertools import chain
df = p.groupby("int")["cod"].apply(lambda x: list(chain.from_iterable(x))).reset_index()
Check more information about flattening lists.
print (df)
int cod
0 1 [1, 1, 2, 2, 1, 2, 3, 9]
1 2 [2, 2, 2, 2]
Upvotes: 6