illuminato
illuminato

Reputation: 1257

How to add np.nan if Regex condition match in Pandas

I have the following dataframe:

d_test = {
    'latitude' : [40.765004, 45.504641, 40.00000, -34.283007],
    'longitude' : [-73.965961, 122.000000, -90.121679, -118.439891],
    'test': [1,2,3,4],
}
df_test = pd.DataFrame(d_test)

I want to use Regex r'^[+-]?[0-9]*[.][0]*?$ to put np.nan if condition matches. I know how to filter result completely using regex:

df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[+-]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]

the code above gives:

     latitude   longitude     test
0   40.765004   -73.965961     1
3   -34.283007  -118.439891    4

but I need the following result:

     latitude   longitude    test
0   40.765004   -73.965961    1
1   NaN          NaN          2
2   NaN          NaN          3
3   -34.283007  -118.439891   4

I tried to use pandas' where:

~df_[['latitude', 'longitude']].where(lambda x: x.astype(str).str.contains(r'^[+-]?[0-9]*[.][0]*?$', regex=True))

but it does not work (AttributeError: 'DataFrame' object has no attribute 'str')

Upvotes: 0

Views: 100

Answers (3)

Timeless
Timeless

Reputation: 37827

You just need to use boolean indexing with pandas.DataFrame.loc :

m = df_test.apply(lambda x: x.astype(str).str.contains(r'^[+-]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)
​
df_test.loc[m, ["latitude", "longitude"]] = np.NaN

# Output :

print(df_test)
    latitude   longitude  test
0  40.765004  -73.965961     1
1        NaN         NaN     2
2        NaN         NaN     3
3 -34.283007 -118.439891     4

Upvotes: 1

Panda Kim
Panda Kim

Reputation: 13247

make your condintion to cond1

cond1 = ~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[+-]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)

use pandas where

df_test.where(cond1)

output:

   latitude   longitude
0  40.765004  -73.965961
1        NaN         NaN
2        NaN         NaN
3 -34.283007 -118.439891

Upvotes: 0

Guy
Guy

Reputation: 50899

You could just assign it back to the columns

df_test[['latitude', 'longitude']] = df_test[~df_test[['latitude', 'longitude']].apply(lambda x: x.astype(str).str.contains(r'^[+-]?[0-9]*[.][0]*?$', regex=True)).any(axis=1)]

Output

    latitude   longitude
0  40.765004  -73.965961
1        NaN         NaN
2        NaN         NaN
3 -34.283007 -118.439891

Upvotes: 1

Related Questions