Reputation: 2238
How do one add a different substring to each row based on a condition in pandas?
Here is a dummy dataframe that I created:
import numpy as np
import pandas as pd
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 2)))
df.columns = ['A','B']
If I replace the rows in B
, with a string YYYY
for those rows which have the value in A
less then 5, then I would do it this way:
df.loc[df['A'] < 2, 'B'] = 'YYYY'
This is the current output of original df:
A B
0 3 4
1 0 1
2 3 0
3 0 1
4 4 4
Of replaced df:
A B
0 3 4
1 0 YYYY
2 3 0
3 0 YYYY
4 4 4
What I instead want is:
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4
Upvotes: 2
Views: 121
Reputation: 863301
Here is necessary generate list with same size like number of True
s values with range
and sum
, then convert to strings and join together:
m = df['A'] < 2
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + list(map(str, range(1, m.sum() + 1)))
print (df)
A B
0 3 4
1 0 1_1
2 3 0
3 0 1_2
4 4 4
Or you can use f-string
s for generate new list:
m = df['A'] < 2
df.loc[m, 'B'] = [f'{b}_{a}' for a, b in zip(range(1, m.sum() + 1), df.loc[m, 'B'])]
EDIT1:
m = df['A'] < 4
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + df[m].groupby('B').cumcount().add(1).astype(str)
print (df)
A B
0 3 4_1
1 0 1_1
2 3 0_1
3 0 1_2
4 4 4
Upvotes: 2