How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?

Question

[EDITED TO CLARIFY QUESTION]

I have large dataframe (approx. 10k rows) with the first few rows looking like what I'll call df_a:

logtime             | zone  | value   
01/01/2017 06:05:00 | 0     | 14.5
01/01/2017 06:05:00 | 1     | 14.5
01/01/2017 06:05:00 | 2     | 17.0
01/01/2017 06:25:00 | 0     | 14.5
01/01/2017 06:25:00 | 1     | 14.5
01/01/2017 06:25:00 | 2     | 10.0
01/01/2017 06:50:00 | 0     | 10.0
01/01/2017 06:50:00 | 1     | 10.0
01/01/2017 06:50:00 | 2     | 10.0
01/01/2017 07:50:00 | 0     | 14.5
01/01/2017 07:50:00 | 1     | 14.5
01/01/2017 07:50:00 | 2     | 14.5
etc.

I am looking to drop consecutive duplicates, so that I am only left with information about how zones change. For example, if zone 1 is at 14.5 over two logtimes, the duplicate is removed until it changes to 10.0. So that I'm left with a dataframe like:

logtime             | zone  | value   
01/01/2017 06:05:00 | 0     | 14.5
01/01/2017 06:05:00 | 1     | 14.5
01/01/2017 06:05:00 | 2     | 17.0
01/01/2017 06:25:00 | 2     | 10.0
01/01/2017 06:50:00 | 0     | 10.0
01/01/2017 06:50:00 | 1     | 10.0
01/01/2017 07:50:00 | 0     | 14.5
01/01/2017 07:50:00 | 1     | 14.5
01/01/2017 07:50:00 | 2     | 14.5
etc.

My understanding is that drop_duplicates will only retain unique values, so this doesn't work for my aim.

I also tried to use a .loc and shift method:

removeduplicates = df.loc[ (df.logtime != df.logtime.shift(1)) | (df.zone != df.zone.shift(1)) | (df.value != df.value.shift(1))]

However, this doesn't fail nor does it work to get the desired output. Thanks!

Ben.T · Accepted Answer

you can create a Boolean mask where the diff between successive values per group of zone is not equal to 0:

print (df[df.groupby(['zone']).value.diff().ne(0)])
                logtime  zone  value
0  01/01/2017 06:05:00      0   14.5
1  01/01/2017 06:05:00      1   14.5
2  01/01/2017 06:05:00      2   17.0
5  01/01/2017 06:25:00      2   10.0
6  01/01/2017 06:50:00      0   10.0
7  01/01/2017 06:50:00      1   10.0

How can I drop consecutive duplicate rows in one column based on condition/grouping from another column?

Answers (1)

Related Questions