Reputation: 2273

Pandas: groupby and concat strings with condition

I have a dataset

id   category   description   status
11   A          Text_1        Finished
11   A          Text_2        Pause
11   A          Text_3        Started
22   A          Text_1        Pause
33   B          Text_1        Finished
33   B          Text_2        Finished

And I want to group data with id and concat description only for raws with status = 'Finished'

So desired output is

id    category   description
11    A          Text_1
22    A          
33    B          Text_1 Text_2

I can concat it using

data.groupby(['id', 'category'])['description'].apply(' '.join).reset_index()

But how can I use condition inside this expression?

Upvotes: 0

Answers (3)

constantstranger

Reputation: 9379

Here's a way:

key = ['id', 'category']
df2 = data[key].drop_duplicates().join(
    data.query("status == 'Finished'").groupby(key).description.apply(' '.join), 
    on=key).fillna('').reset_index(drop=True)

Explanation:

use query() to filter on status for "Finished", use groupby() to group by the key [id, category], then use your str.join() logic on the description column values within each group
use a de-duped version of the key columns and DataFrame.join() to expand the filtered results to contains all key values, and use fillna() to replace NaN with an empty string in the description column for keys that were filtered out.

Output:

   id category    description
0  11        A         Text_1
1  22        A
2  33        B  Text_1 Text_2

Upvotes: 1

mozway

Reputation: 261860

You can use groupby.apply with a conditional and a default value if the group is empty after filtering:

out = (df
 .groupby(['id', 'category'])
 .apply(lambda g: ' '.join(d['description'])
                  if len(d:=g[g['status'].eq('Finished')])
                  else '' )
 .reset_index(name='description')
 )

Output:

   id category    description
0  11        A         Text_1
1  22        A               
2  33        B  Text_1 Text_2

Upvotes: 2

BENY

Reputation: 323326

you can filter before groupby then reindex with the missing groups

out = data.loc[data.status == 'Finished'].groupby(['id', 'category'])['description'].apply(' '.join).reindex(pd.MultiIndex.from_frame(data[['id','category']].drop_duplicates()),fill_value= ' ').reset_index()
Out[70]: 
   id category    description
0  11        A         Text_1
1  22        A               
2  33        B  Text_1 Text_2

Upvotes: 3

Pandas: groupby and concat strings with condition

Answers (3)

Related Questions