Reputation: 33
I have a dataframe, one column (col1) of which contains values either Y or N. I would like to assign values (random, not repetitive numbers) to the next column (col2) based on the values in col1 - if value in col1 equals to N, then value in col2 would be some number, if value in col1 equals to Y, then value in col2 would repeat the previous. I tried to create a for loop and iterate over rows using df.iterrows(), however the numbers in col2 were equal for all Ns.
Example of the dataframe I want to get:
df = pd.DataFrame([[N, Y, Y, N, N, Y], [1, 1, 1, 2, 3, 3]])
where for each new N new number is assigned in other column, while for each Y the number is repeated as in previous row.
Upvotes: 3
Views: 2427
Reputation: 3967
Assuming a DataFrame df:
df = pd.DataFrame(['N', 'Y', 'Y', 'N', 'N', 'Y'], columns=['YN'])
YN
0 N
1 Y
2 Y
3 N
4 N
5 Y
Using itertuples
(no repeation):
np.random.seed(42)
arr = np.arange(1, len(df[df.YN == 'N']) + 1)
np.random.shuffle(arr)
cnt = 0
for idx, val in enumerate(df.itertuples()):
if df.YN[idx] == 'N':
df.loc[idx, 'new'] = arr[cnt]
cnt += 1
else:
df.loc[idx, 'new'] = np.NaN
df.new = df.new.ffill().astype(int)
df
YN new
0 N 1
1 Y 1
2 Y 1
3 N 2
4 N 3
5 Y 3
Using apply
(repetition may arise with small number range):
np.random.seed(42)
df['new'] = df.YN.apply(lambda x: np.random.randint(10) if x == 'N' else np.NaN).ffill().astype(int)
YN new
0 N 6
1 Y 6
2 Y 6
3 N 3
4 N 7
5 Y 7
Upvotes: 2