Reputation: 676

Choosing non repetitive values in dataframe columns

I have the following dataframe.

import pandas as pd
dates = pd.date_range('20130101', periods=10)
df = pd.DataFrame([1,1,1,-1,-1,-1,1,1,-1,1], index=dates, columns=list('A'))

Expected output from df

df_out=pd.DataFrame([1,0,0,-1,0,0,1,0,-1,1], index=dates, columns=list('A'))

I want to choose alternate +1 and -1 and substitute zero when there is repetition.

df can be a big dataframe of 10 columns and I want this conversion on all the columns. What is the effective way without using for loop? Please suggest the way forward. Thanking in anticipation.

Upvotes: 2

Answers (4)

Nidal

Reputation: 415

Try:

df['A'] = df['A'] * (df['A'].diff() != 0)

How this works:

diff() calculates the difference between successive values in your series. If the diff is 0 then we know there was a repetition.

Therefore we can do a != 0 check which will create a boolean series which will be True wherever there was no repetition and false where there was a repetition.

Boolean series can be used as a series of zeroes and ones and multiplied against the original series resulting in zeroing out all the repetitions.

Upvotes: 2

yatu

Reputation: 88236

IIUC you could use Series.diff along with ne to check which first differences are not 0, or in other words, which subsequent values are not repeated, and replace those that are False with 0 using DataFrame.where:

df.where(df.A.diff().ne(0), 0)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

Upvotes: 2

Jurgen Strydom

Reputation: 3930

A third option:

import pandas as pd
import numpy as np

def check_dup(data):
    print(data)
    if data[0] == data[1]:
        return 0
    else:
        return data[1]

df = pd.DataFrame(np.random.randint(0,2, (10,1))*2-1)

df.rolling(window=2).apply(check_dup, raw=True)

Upvotes: 0

anky

Reputation: 75080

Try using np.where():

df.A=np.where(df.A.ne(df.A.shift()),df.A,0)
print(df)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

Upvotes: 2

Choosing non repetitive values in dataframe columns

Answers (4)

Related Questions