Reputation: 676
I have the following dataframe.
import pandas as pd
dates = pd.date_range('20130101', periods=10)
df = pd.DataFrame([1,1,1,-1,-1,-1,1,1,-1,1], index=dates, columns=list('A'))
Expected output from df
df_out=pd.DataFrame([1,0,0,-1,0,0,1,0,-1,1], index=dates, columns=list('A'))
I want to choose alternate +1 and -1 and substitute zero when there is repetition.
df can be a big dataframe of 10 columns and I want this conversion on all the columns. What is the effective way without using for loop? Please suggest the way forward. Thanking in anticipation.
Upvotes: 2
Views: 174
Reputation: 415
Try:
df['A'] = df['A'] * (df['A'].diff() != 0)
How this works:
diff()
calculates the difference between successive values in your series. If the diff is 0 then we know there was a repetition.
Therefore we can do a != 0
check which will create a boolean series which will be True wherever there was no repetition and false where there was a repetition.
Boolean series can be used as a series of zeroes and ones and multiplied against the original series resulting in zeroing out all the repetitions.
Upvotes: 2
Reputation: 88236
IIUC you could use Series.diff
along with ne
to check which first differences are not 0
, or in other words, which subsequent values are not repeated, and replace those that are False
with 0
using DataFrame.where
:
df.where(df.A.diff().ne(0), 0)
A
2013-01-01 1
2013-01-02 0
2013-01-03 0
2013-01-04 -1
2013-01-05 0
2013-01-06 0
2013-01-07 1
2013-01-08 0
2013-01-09 -1
2013-01-10 1
Upvotes: 2
Reputation: 3930
A third option:
import pandas as pd
import numpy as np
def check_dup(data):
print(data)
if data[0] == data[1]:
return 0
else:
return data[1]
df = pd.DataFrame(np.random.randint(0,2, (10,1))*2-1)
df.rolling(window=2).apply(check_dup, raw=True)
Upvotes: 0
Reputation: 75080
Try using np.where()
:
df.A=np.where(df.A.ne(df.A.shift()),df.A,0)
print(df)
A
2013-01-01 1
2013-01-02 0
2013-01-03 0
2013-01-04 -1
2013-01-05 0
2013-01-06 0
2013-01-07 1
2013-01-08 0
2013-01-09 -1
2013-01-10 1
Upvotes: 2