Emil
Emil

Reputation: 1722

Conditionally concatenate strings in a Pandas dataframe

I have a dataframe that looks like:

df:

      Source    Text    
      Agent     aa      
      Agent     ab      
      Visitor   ac      
      Agent     ad      
      Visitor   ae      
      Visitor   ba      
      Visitor   bb      
      Agent     bc      
      Agent     bd      
      Agent     be    

I would like to create a new dataframe that looks the following:

      Source    Text    
      Agent     aa ab   
      Visitor   ac      
      Agent     ad      
      Visitor   ae ba bb 
      Agent     bc bd be 

So, for each new instance of an agent/visitor talking I want to concatenate the text of the different things one person said in a new cell.

I've found this post, however this is too specific and I don't see how to apply this to my dataframe.

Upvotes: 1

Views: 344

Answers (2)

Shehan Ishanka
Shehan Ishanka

Reputation: 593

Try this.

d=df.groupby("Source").groups
print(pd.DataFrame([ [k,' '.join(list(df.groupby("Source").get_group(k)["Text"]))] for k in d.keys() ],columns=["Source","Text"]))

Upvotes: 0

jezrael
jezrael

Reputation: 862511

You can groupby by Series which is create by cumsum of shifted column Source by shift with join, last use double Series.reset_index - first for remove first level of MultiIndex and second for column from Source:

g = df['Source'].ne(df['Source'].shift()).cumsum()
df1 = (df.groupby([g, 'Source'])['Text']
         .apply(' '.join)
         .reset_index(level=0, drop=True)
         .reset_index())
print (df1)
    Source      Text
0    Agent     aa ab
1  Visitor        ac
2    Agent        ad
3  Visitor  ae ba bb
4    Agent  bc bd be

Upvotes: 1

Related Questions