Reputation: 1255
I have a pandas dataframe as below:
df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df
X
0 1
1 1
2 1
3 0
4 0
Now I want to create another variable 'Y' and Values for Y should be based on the below condition:
If X = 1 , Y=1
If X = 0 and previous X = 1, Y = 2
If X = 0 and previous x = 0, Y = 0
So, my final output should look like below:
X Y
0 1 1
1 1 1
2 1 1
3 0 2
4 0 0
This can be achieved by iterating over rows and setting up a current and previous row and using iloc but I want a more efficient way of doing this faster
Upvotes: 3
Views: 344
Reputation: 5460
Celius provided an answer with nested calls to np.where
. This can become unfeasible if the number of conditions grow. You can use np.select
instead to achieve the same result:
import numpy as np
import pandas as pd
df = pd.DataFrame({
'X': [1, 1, 1, 0, 0]
})
conditions = [
df["X"] == 1,
(df["X"] == 0) & (df["X"].shift() == 1),
(df["X"] == 0) & (df["X"].shift() == 0)
]
values = [1, 2, 0]
df['Y'] = np.select(conditions, values, default=np.nan)
Upvotes: 0
Reputation: 18367
You can try using np.where
and shift
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df['Y'] = np.where(df['X'] == 1,1,np.where(df['X'].shift(periods=1) == 1,2,0))
print(df)
Output:
X Y
0 1 1
1 1 1
2 1 1
3 0 2
4 0 0
Upvotes: 1