Reputation: 1596
I have a dataframe as given below
>>> df
t c f e
0 1 100 2 1
1 1 200 1 1
2 1 300 4 0
3 1 400 2 0
4 2 100 3 1
5 2 200 3 1
6 2 300 4 1
7 2 400 1 0
8 3 100 4 0
9 3 200 3 0
10 3 300 1 1
11 3 400 4 1
12 4 100 1 1
13 4 200 4 1
14 4 300 4 1
15 4 400 2 1
I want to add a new column using pairwise information of the rows. In the above case, I want to add a new column 'rr' with value 1 if i-th row and (i+4)-th row has same value for column 'e' (0, in case i+4 index does not exist) and similarly I also want to add another column 'rr2' is i-th row and (I+1)-th row has same value for column 'e'.
>>> df
t c f e rr rr2
0 1 100 2 1 1 1
1 1 200 1 1 0 1
2 1 300 4 0 1 0
3 1 400 2 0 0 1
4 2 100 3 1 1 0
5 2 200 3 1 1 0
6 2 300 4 1 0 1
7 2 400 1 0 1 0
8 3 100 4 0 1 0
9 3 200 3 0 0 1
10 3 300 1 1 1 1
11 3 400 4 1 1 1
12 4 100 1 1 1 0
13 4 200 4 1 1 0
14 4 300 4 1 1 0
15 4 400 2 1 1 0
My idea was using the apply method
X['rr'] = X.apply(lambda x: func1(x),axis=1 )
X['rr2'] = X.apply(lambda x: func2(x),axis=1 )
But in that case, I will not be able to access the i+1 or i+4 indices of the original dataframe. Is there a way to do this efficiently, rather than going through each row one-by-one.
Upvotes: 0
Views: 63
Reputation: 30920
Use Series.shift
df['rr'] = df['e'].eq(df['e'].shift(-4)).astype(int)
df['rr2'] = df['e'].eq(df['e'].shift(-1)).astype(int)
print(df)
t c f e rr rr2
0 1 100 2 1 1 1
1 1 200 1 1 1 0
2 1 300 4 0 0 1
3 1 400 2 0 1 0
4 2 100 3 1 0 1
5 2 200 3 1 0 1
6 2 300 4 1 1 0
7 2 400 1 0 0 1
8 3 100 4 0 0 1
9 3 200 3 0 0 0
10 3 300 1 1 1 1
11 3 400 4 1 1 1
12 4 100 1 1 0 1
13 4 200 4 1 0 1
14 4 300 4 1 0 1
15 4 400 2 1 0 0
Note:
When NaN is compared the result always returns False
Upvotes: 1