hii
hii

Reputation: 87

replace value to NaN based on other column value python pandas

I got the following test dataframe.

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 1234 True False
2 12345 123 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 55555 333 True True

If Match1 is True I want phone1_y value empty, so NaN. If Match2 is True I want phone2_y value empty.

Output would looks like this:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 True False
2 12345 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 True True

it would also be cool to get the following output:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 12 True False
2 12345 123 12345 False True
33 4444 22 2343321 3223455 False False
66 55555 333 True True

I hope someone want to help. I am working with python pandas.

EDIT:

It's still not working for me, I guess because my code is wrong:

df = pd.read_csv('TM.csv', sep=',') #x
df1 = pd.read_csv('TM_verrijk.csv', sep=',') #y 

df['phone1'] = df['phone1'].str.replace('-', '')
df['phone1'] = df['phone1'].str.replace('+', '')
df1['phone1'] = df1['phone1'].str.replace('-', '')
df1['phone1'] = df1['phone1'].str.replace('+', '')

df['phone2'] = df['phone2'].str.replace('-', '')
df['phone2'] = df['phone2'].str.replace('+', '')
df1['phone2'] = df1['phone2'].str.replace('-', '')
df1['phone2'] = df1['phone2'].str.replace('+', '')

dff = df.merge(df1, on='contact')

dff['phone1_x'] = pd.to_numeric(dff['phone1_x'], errors='coerce')
dff['phone1_y'] = pd.to_numeric(dff['phone1_y'], errors='coerce')

dff['phone2_x'] = pd.to_numeric(dff['phone2_x'], errors='coerce')
dff['phone2_y'] = pd.to_numeric(dff['phone2_y'], errors='coerce')

dff['Match?'] = np.where(dff['phone1_x'] == dff['phone1_y'], 'True', 'False')
dff['Match2?'] = np.where(dff['phone2_x'] == dff['phone2_y'], 'True', 'False')

When i run dff now I got the same dataframe as in my question:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 1234 True False
2 12345 123 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 55555 333 True True
dff[['phone1_y','phone2_y']] = np.where(dff[['Match?','Match2?']], 
                                       np.nan, 
                                       dff[['phone1_x','phone2_x']])

This code gives everything NaN value. Also the other answers gives NaN value:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 True False
2 12345 123 False True
33 4444 22 False False
66 55555 333 True True

Any idea why it won't work with my code?

EDIT:

Columns Match? and Match2? needs to be boolean:

dff['Match?'] = dff['Match?'].map({'False':False, 'True':True})
dff['Match2?'] = dff['Match2?'].map({'False':False, 'True':True})

Thanks everyone for the answers!!

Upvotes: 1

Views: 1635

Answers (3)

jezrael
jezrael

Reputation: 863401

Use numpy.where - first replace True to np.nan, if False no replace:

df[['phone1_y','phone2_y']] = np.where(df[['Match1','Match2']], 
                                       np.nan, 
                                       df[['phone1_y','phone2_y']])
print (df)
   contact  phone1_x  phone2_x   phone1_y   phone2_y  Match1  Match2
0        1      1234        12        NaN        NaN    True   False
1        2     12345       123        NaN        NaN   False    True
2       33      4444        22  2343321.0  3223455.0   False   False
3       66     55555       333        NaN        NaN    True    True

For second use:

m1 = df[['Match1','Match2']].all(axis=1).to_numpy()[:, None]
m2 = df[['Match1','Match2']].any(axis=1).to_numpy()[:, None]
m3 = df[['Match1','Match2']]

x = df[['phone1_x','phone2_x']]
y = df[['phone1_y','phone2_y']]


df[['phone1_y','phone2_y']] = np.select([m2 & ~m3 , ~m2 & ~m3, ~m1],
                                        [x,y,np.nan], 
                                        default=np.nan)
                              
print (df)
   contact  phone1_x  phone2_x   phone1_y   phone2_y  Match1  Match2
0        1      1234        12        NaN       12.0    True   False
1        2     12345       123    12345.0        NaN   False    True
2       33      4444        22  2343321.0  3223455.0   False   False
3       66     55555       333        NaN        NaN    True    True

Upvotes: 2

Raymond Toh
Raymond Toh

Reputation: 799

Use .loc

#Set phone1_y as NaN if Match1 == True
df.loc[df["Match1"] == True, "phone1_y"] = np.nan

#Set phone2_y as NaN if Match2 == True
df.loc[df["Match2"] == True, "phone2_y"] = np.nan

#Set phone1_y as phone1_x if phone1_y is NaN and phone1_x is not NaN
df.loc[(df["phone1_y"].isna()) & (df["phone1_x"].notna()), "phone1_y"] = df["phone1_x"]

#Set phone2_y as phone2_x if phone2_y is NaN and phone2_x is not NaN
df.loc[(df["phone2_y"].isna()) & (df["phone2_x"].notna()), "phone2_y"] = df["phone2_x"]

Upvotes: 1

gal peled
gal peled

Reputation: 482

Let me teach you the magic of the apply function in pandas

def get_new_value(first_val,sec_val):
    if sec_val:
        return None
    return first_val

df['phone1_y'] = df.apply(lambda x: get_new_value(x['phone1_y'],x['Match1']),axis=1)
df['phone2_y'] = df.apply(lambda x: get_new_value(x['phone2_y'],x['Match2']),axis=1)

make sure you use the axis and I am sure you will manage the last part yourself see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html for more options

Upvotes: 1

Related Questions