Anand
Anand

Reputation: 139

In-place update in pandas: update the value of the cell based on a condition

      DOB        Name      
0   1956-10-30   Anna
1   1993-03-21   Jerry        
2   2001-09-09   Peter     
3   1993-01-15   Anna   
4   1999-05-02   James 
5   1962-12-17   Jerry
6   1972-05-04   Kate

In the dataframe similar to the one above where I have duplicate names. So I am want to add a suffix '_0' to the name if DOB is before 1990 and a duplicate name.

I am expecting a result like this

      DOB        Name      
0   1956-10-30   Anna_0
1   1993-03-21   Jerry        
2   2001-09-09   Peter     
3   1993-01-15   Anna   
4   1999-05-02   James
5   1962-12-17   Jerry_0
6   1972-05-04   Kate

I am using the following

df['Name'] = df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))].Name.apply(lambda x: x+'_0')

But I am getting this result

      DOB        Name      
0   1956-10-30   Anna_0
1   1993-03-21   NaN
2   2001-09-09   NaN     
3   1993-01-15   NaN   
4   1999-05-02   NaN
5   1962-12-17   Jerry_0
6   1972-05-04   NaN

How can I add a suffix to the Name which is a duplicate and have to be born before 1990.

Upvotes: 1

Views: 1125

Answers (2)

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

Problem in your df['Name'] = df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))].Name.apply(lambda x: x+'_0') is that df[(df['DOB'] < '01-01-1990') & (df['Name'].isin(['Anna','Jerry']))] is a filtered dataframe whose rows are less than the original. When you assign it back, the not filtered rows doesn't have corresponding value in the filtered dataframe, so it becomes NaN.

You can try mask instead

m = (df['DOB'] < '1990-01-01') & df['Name'].duplicated(keep=False)

df['Name'] = df['Name'].mask(m, df['Name']+'_0')

Upvotes: 1

mozway
mozway

Reputation: 261924

You can use masks and boolean indexing:

# is the year before 1990?
m1 = pd.to_datetime(df['DOB']).dt.year.lt(1990)
# is the name duplicated?
m2 = df['Name'].duplicated(keep=False)

# if both conditions are True, add '_0' to the name
df.loc[m1&m2, 'Name'] += '_0'

output:

          DOB     Name
0  1956-10-30   Anna_0
1  1993-03-21    Jerry
2  2001-09-09    Peter
3  1993-01-15     Anna
4  1999-05-02    James
5  1962-12-17  Jerry_0
6  1972-05-04     Kate

Upvotes: 0

Related Questions