Reputation: 1722
I have a dataframe that looks like:
df:
Source Text
Agent aa
Agent ab
Visitor ac
Agent ad
Visitor ae
Visitor ba
Visitor bb
Agent bc
Agent bd
Agent be
I would like to create a new dataframe that looks the following:
Source Text
Agent aa ab
Visitor ac
Agent ad
Visitor ae ba bb
Agent bc bd be
So, for each new instance of an agent/visitor talking I want to concatenate the text of the different things one person said in a new cell.
I've found this post, however this is too specific and I don't see how to apply this to my dataframe.
Upvotes: 1
Views: 344
Reputation: 593
Try this.
d=df.groupby("Source").groups
print(pd.DataFrame([ [k,' '.join(list(df.groupby("Source").get_group(k)["Text"]))] for k in d.keys() ],columns=["Source","Text"]))
Upvotes: 0
Reputation: 862511
You can groupby
by Series
which is create by cumsum
of shifted column Source
by shift
with join
, last use double Series.reset_index
- first for remove first level of MultiIndex and second for column from Source
:
g = df['Source'].ne(df['Source'].shift()).cumsum()
df1 = (df.groupby([g, 'Source'])['Text']
.apply(' '.join)
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
Source Text
0 Agent aa ab
1 Visitor ac
2 Agent ad
3 Visitor ae ba bb
4 Agent bc bd be
Upvotes: 1