Reputation: 693
I have a pandas df like this:
df1 = pd.DataFrame({"strings":pd.Series(["a very ", "very boring text", " I cannot read it", "Hi everyone"," please go home ","or I will go ","now"]),
"changetype":pd.Series([0,0,-1,0,1,1,1])})
Everytime changetype in row == row+1, I'd like to concat the strings. Thus, the final df should look like this:
df2 = pd.DataFrame({"strings":pd.Series(["a very very boring text", " I cannot read it", "Hi everyone"," please go home or I will go now"]),
"changetype":pd.Series([0,-1,0,1,])})
I started like this but didn't know how to continue:
for row, rowplus in zip(df1.changetype, df1.changetype[1:]):
if row == rowplus:
# concat rows here ...
Upvotes: 1
Views: 132
Reputation: 862661
Use groupby
by helper Series
with aggregate first
and join
:
s = df1['changetype'].ne(df1['changetype'].shift()).cumsum()
df3 = df1.groupby(s).agg({'strings':' '.join, 'changetype':'first'}).reset_index(drop=True)
print (df3)
strings changetype
0 a very very boring text 0
1 I cannot read it -1
2 Hi everyone 0
3 please go home or I will go now 1
Explanation:
Compare shift
d column with ne
(!=
) and add cumsum
for consecutive groups Series
:
print (s)
0 1
1 1
2 2
3 3
4 4
5 4
6 4
Name: changetype, dtype: int32
Upvotes: 1