Reputation: 1679
I have a pandas dataframe with several columns (words, start time, stop time, speaker
):
word
column while the values in the data
column do not change OR the values in the column meta_data
do not change.start
value for the first word and the stop
value for the last word in the combination.I currently have:
word start stop data meta_data
0 but 2.72 2.85 2 9
1 that's 2.85 3.09 2 9
2 alright 3.09 3.47 2 1
3 we'll 8.43 8.69 1 4
4 have 8.69 8.97 1 4
5 to 8.97 9.07 1 4
6 okay 9.19 10.01 2 2
7 sure 10.02 11.01 2 1
8 what? 11.02 12.00 1 4
However, I would like to turn this into:
word start start data meta_data
0 but that's 2.72 3.09 2 9
1 alright 3.09 3.47 2 1
2 we'll have to 8.43 9.07 1 4
3 okay 9.19 10.01 2 2
4 sure 10.02 11.01 2 1
5 what? 11.02 12.00 1 4
Upvotes: 0
Views: 114
Reputation: 30940
doing some math here + GroupBy.agg
s=df['data']+df['meta_data']
groups=s.ne(s.shift()).cumsum()
new_df=( df.groupby(groups)
.agg({'word':' '.join,'start':'min',
'stop':'max','data':'first',
'meta_data':'first'}) )
print(new_df)
word start stop data meta_data
1 but that's 2.72 3.09 2 9
2 alright 3.09 3.47 2 1
3 we'll have to 8.43 9.07 1 4
4 okay 9.19 10.01 2 2
5 sure 10.02 11.01 2 1
6 what? 11.02 12.00 1 4
if you think that the sum can correspond in two different and consecutive groups you can use a somewhat more complex function with decimals
p=(df['data']+0.1723).pow(df['meta_data']+2.017)
groups=p.ne(p.shift()).cumsum()
Upvotes: 2
Reputation: 323386
This need to create a help key , then we do shift
+ cumsum
create the groupkey based on that
df['Key']=df[['data','meta_data']].apply(tuple,1)
d={'word':' '.join,'start':'min','stop':'max','data':'first','meta_data':'first'}
df.groupby(df.Key.ne(df.Key.shift()).cumsum()).agg(d).reset_index(drop=True)
Out[171]:
word start stop data meta_data
0 but that's 2.72 3.09 2 9
1 alright 3.09 3.47 2 1
2 we'll have to 8.43 9.07 1 4
3 okay 9.19 10.01 2 2
4 sure 10.02 11.01 2 1
5 what? 11.02 12.00 1 4
Upvotes: 3