Federico Dorato
Federico Dorato

Reputation: 784

Pandas: how can I put in a set the elements grouped by groupby

I have the following (example) dataset:

>>> pd.DataFrame([["001", "Apple"],["002","Strawberry"],["001", None],["002","Strawberry"], ["003", "Apple"],["003","Strawberry"],], columns = ["Deal", "Product"])

  Deal     Product
0  001       Apple
1  002  Strawberry
2  001            
3  002  Strawberry
4  003       Apple
5  003  Strawberry

I would like to group in a set all the products linked to an order, like this:

  Deal              Product
0  001              {Apple}
1  002         {Strawberry}
2  003  {Strawberry, Apple}

I have an attempted solution as one of the answers below, I would like to understand if I am doing it in the right (pythonic, fastest) way

Upvotes: 0

Views: 55

Answers (4)

Ch3steR
Ch3steR

Reputation: 20669

You can use SeriesGroupBy.agg with pd.Series.dropna with set.

To delete '' just use set difference here i.e {1,2,3} - {3} = {1,2}. So, set(x.dropna())-{''} would give a set without '' after dropping all missing values.

df.groupby('Deal')['Product'].agg(lambda x:set(x.dropna())-{''})

Deal
001                {Apple}
002           {Strawberry}
003    {Strawberry, Apple}
Name: Product, dtype: object

Upvotes: 1

Henry Yik
Henry Yik

Reputation: 22503

You can use notnull:

print (df.loc[df['Product'].notnull()].groupby("Deal")["Product"].apply(set))

Deal
001                {Apple}
002           {Strawberry}
003    {Strawberry, Apple}

Use df.loc[~df['Product'].isin([None,""])]... if you want to take care of both None and ''. Or use the methods from other answers for filtering.

Upvotes: 2

jezrael
jezrael

Reputation: 862681

Idea is remove missing values with chained Series.notna with test no empty strings by & for bitwise AND:

mask = df['Product'].notna() & df['Product'].ne('')
df = df[mask].groupby('Deal')['Product'].agg(set).reset_index()
print (df)
  Deal              Product
0  001              {Apple}
1  002         {Strawberry}
2  003  {Apple, Strawberry}

Upvotes: 2

Federico Dorato
Federico Dorato

Reputation: 784

I started from this answer to solve my problem

# Turn every element in product in a set of one or zero elements
# Also Ensure we don't have null values
df["Product"] = df["Product"].apply(lambda val: {val} if val not in [None, ""] else {})
#Use the answer mentioned to bring together the single sets
df.groupby('Deal').agg({'Product':lambda x: set.union(*x)}).reset_index('Deal')

Final result:

>>> df
  Deal              Product
0  001              {Apple}
1  002         {Strawberry}
2  003  {Strawberry, Apple}

Upvotes: 0

Related Questions