How to change consecutive repeating values in pandas dataframe series to nan or 0?

Question

I have a pandas dataframe created from measured numbers. When something goes wrong with the measurement, the last value is repeated. I would like to do two things:
1. Change all repeating values either to nan or 0.
2. Keep the first repeating value and change all other values nan or 0.

I have found solutions using "shift" but they drop repeating values. I do not want to drop repeating values.My data frame looks like this:

df = pd.DataFrame(np.random.randn(15, 3))
df.iloc[4:8,0]=40
df.iloc[12:15,1]=22
df.iloc[10:12,2]=0.23

giving a dataframe like this:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5   40.000000  -0.074763 -0.840403
6   40.000000   0.709794 -1.000048
7   40.000000   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340  0.230000
12   0.116258  22.000000  1.119744
13  -0.501180  22.000000  0.558941
14   0.551586  22.000000 -0.993749

what I would like to be able to do is write some code that would filter the data and give me a data frame like this:

           0         1         2
0   1.239916  1.109434  0.305490
1   0.248682  1.472628  0.630074
2  -0.028584 -1.116208  0.074299
3  -0.784692 -0.774261 -1.117499
4        NaN  0.283084 -1.495734
5        NaN -0.074763 -0.840403
6        NaN  0.709794 -1.000048
7        NaN  0.920943  0.681230
8  -0.701831  0.547689 -0.128996
9  -0.455691  0.610016  0.420240
10 -0.856768 -1.039719       NaN
11  1.187208  0.964340       NaN
12  0.116258       NaN  1.119744
13 -0.501180       NaN  0.558941
14  0.551586       NaN -0.993749

or even better keep the first value and change the rest to NaN. Like this:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5         NaN  -0.074763 -0.840403
6         NaN   0.709794 -1.000048
7         NaN   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340       NaN
12   0.116258  22.000000  1.119744
13  -0.501180        NaN  0.558941
14   0.551586        NaN -0.993749

Haleemur Ali · Accepted Answer

using shift & mask:

df.shift(1) == df compares the next row to the current for consecutive duplicates.

df.mask(df.shift(1) == df)

# outputs    
            0          1         2
0    0.365329   0.153527  0.143244
1    0.688364   0.495755  1.065965
2    0.354180  -0.023518  3.338483
3   -0.106851   0.296802 -0.594785
4   40.000000   0.149378  1.507316
5         NaN  -1.312952  0.225137
6         NaN  -0.242527 -1.731890
7         NaN   0.798908  0.654434
8    2.226980  -1.117809 -1.172430
9   -1.228234  -3.129854 -1.101965
10   0.393293   1.682098  0.230000
11  -0.029907  -0.502333       NaN
12   0.107994  22.000000  0.354902
13  -0.478481        NaN  0.531017
14  -1.517769        NaN  1.552974

if you want to remove all the consecutive duplicates, test that the previous row is also the same as the current row

df.mask((df.shift(1) == df) | (df.shift(-1) == df))

How to change consecutive repeating values in pandas dataframe series to nan or 0?

Answers (2)

Related Questions