Reputation: 2495
I want to drop and count duplicates in column val when val equal to 1.
Then set start to be the first row and end to be the last row of consecutive duplicates.
df = pd.DataFrame()
df['start'] = [1, 2, 3, 4, 5, 6, 18, 30, 31]
df['end'] = [2, 3, 4, 5, 6, 18, 30, 31, 32]
df['val'] = [1 , 1, 1, 1, 1, 12, 12, 1, 1]
df
start end val
0 1 2 1
1 2 3 1
2 3 4 1
3 4 5 1
4 5 6 1
5 6 18 12
6 18 30 12
7 30 31 1
8 31 32 1
Expected Result
start end val
0 1 6 5
1 6 18 12
2 18 30 12
3 30 32 2
I tried. df[~((df.val==1) & (df.val == df.val.shift(1)) & (df.val == df.val.shift(-1)))]
start end val
0 1 2 1
4 5 6 1
5 6 18 12
6 18 30 12
7 30 31 1
8 31 32 1
but I can't figure out how to complete my expected result, any suggestion?
Upvotes: 2
Views: 559
Reputation: 1167
You could also do a two-liner with a mask to groupby:
m = (df.val.ne(1) | df.val.ne(df.val.shift())).cumsum()
df = df.groupby(m).agg({'start': 'first', 'end': 'last', 'val': 'last'})
Upvotes: 2
Reputation: 1430
Solution by @jezrael is perfect, but here is slightly different approach:
df['aux'] = (df['val'] != df['val'].shift()).cumsum()
df.loc[df['val'] == 1, 'end'] = df[df['val'] == 1].groupby('aux')['end'].transform('last')
df.loc[df['val'] == 1, 'val'] = df.groupby('aux')['val'].transform('sum')
df = df.drop_duplicates(subset=df.columns.difference(['start']), keep='first')
df = df.drop(columns=['aux'])
Upvotes: 0
Reputation: 863226
Use:
#mask by condition
m = df.val==1
#consecutive groups
g = m.ne(m.shift()).cumsum()
#filter by condition and aggregate per groups
df1 = df.groupby(g[m]).agg({'start':'first', 'end':'last', 'val':'sum'})
#concat together, for correct order create index by g
df = pd.concat([df1, df.set_index(g)[~m.values]]).sort_index().reset_index(drop=True)
print (df)
start end val
0 1 6 5
1 6 18 12
2 18 30 12
3 30 32 2
Upvotes: 3