concatenate rows in column based on value in following row

Question

I have a pandas df like this:

df1 = pd.DataFrame({"strings":pd.Series(["a very ", "very boring text", " I cannot read it", "Hi everyone"," please go home ","or I will go ","now"]),
                "changetype":pd.Series([0,0,-1,0,1,1,1])})

Everytime changetype in row == row+1, I'd like to concat the strings. Thus, the final df should look like this:

df2 = pd.DataFrame({"strings":pd.Series(["a very very boring text", " I cannot read it", "Hi everyone"," please go home or I will go now"]),
                "changetype":pd.Series([0,-1,0,1,])})

I started like this but didn't know how to continue:

for row, rowplus in zip(df1.changetype, df1.changetype[1:]):
if row == rowplus:
    # concat rows here ...

jezrael · Accepted Answer

Use groupby by helper Series with aggregate first and join:

s = df1['changetype'].ne(df1['changetype'].shift()).cumsum()
df3 = df1.groupby(s).agg({'strings':' '.join, 'changetype':'first'}).reset_index(drop=True)
print (df3)
                              strings  changetype
0            a very  very boring text           0
1                    I cannot read it          -1
2                         Hi everyone           0
3   please go home  or I will go  now           1

Explanation:

Compare shiftd column with ne (!=) and add cumsum for consecutive groups Series:

print (s)
0    1
1    1
2    2
3    3
4    4
5    4
6    4
Name: changetype, dtype: int32

concatenate rows in column based on value in following row

Answers (1)

Related Questions