tandem
tandem

Reputation: 2238

different substring for each row based on condition

How do one add a different substring to each row based on a condition in pandas?

Here is a dummy dataframe that I created:

import numpy as np
import pandas as pd
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,5,size=(5, 2)))
df.columns = ['A','B']

If I replace the rows in B, with a string YYYY for those rows which have the value in A less then 5, then I would do it this way:

df.loc[df['A'] < 2, 'B'] = 'YYYY'

This is the current output of original df:

   A  B
0  3  4
1  0  1
2  3  0
3  0  1
4  4  4

Of replaced df:

   A     B
0  3     4
1  0  YYYY
2  3     0
3  0  YYYY
4  4     4

What I instead want is:

   A     B
0  3     4
1  0    1_1
2  3     0
3  0    1_2
4  4     4

Upvotes: 2

Views: 121

Answers (1)

jezrael
jezrael

Reputation: 863301

Here is necessary generate list with same size like number of Trues values with range and sum, then convert to strings and join together:

m = df['A'] < 2
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + list(map(str, range(1, m.sum() + 1)))

print (df)
   A    B
0  3    4
1  0  1_1
2  3    0
3  0  1_2
4  4    4

Or you can use f-strings for generate new list:

m = df['A'] < 2
df.loc[m, 'B'] = [f'{b}_{a}' for a, b in zip(range(1, m.sum() + 1), df.loc[m, 'B'])]

EDIT1:

m = df['A'] < 4
df.loc[m, 'B'] = df.loc[m, 'B'].astype(str) + '_' + df[m].groupby('B').cumcount().add(1).astype(str)

print (df)
   A    B
0  3  4_1
1  0  1_1
2  3  0_1
3  0  1_2
4  4    4

Upvotes: 2

Related Questions