Delete variables based on their duplicate consecutive values in pandas

Question

Suppose I have a dataframe in python with index, variable and value columns. I would like to delete the rows in which the variable has the same value as a previous instant. In other words, I would like to get another dataframe with variables whose values are changing. Example,
Original dataframe:

Index  var  val
0      x1   10
1      x2   11
2      x1   10
3      x3   12
4      x1   13
5      x3   14
6      x2   12
7      x1   14

Output table:

Index  var  val
0      x1   10
1      x2   11
2      x3   12
3      x1   13
4      x3   14
5      x2   12
6      x1   14

What would be the most efficient way to do this?, imagine if I have a lot of variables (1000).
Cheers.

cardamom · Accepted Answer

I found a step by step way which creates a few True False 'helper' columns, after sorting and then finally reindexes it and returns just the desired columns.

df = df.sort_values(['var', 'val'])
df['original_index'] = df.index
df = df.reset_index()
df['varHasChanged'] = df['var'] != df['var'].shift(+1)
df['valhasChanged'] = df['val'] != df['val'].shift(+1)
df = df[~((df.varHasChanged == False) & (df.valhasChanged == False))]
df = df.sort_values(['Index'])
df = df.set_index(['Index'])
df = df[['var', 'val']]
df

Delete variables based on their duplicate consecutive values in pandas

Answers (2)

Related Questions