Reputation: 10011
Given a dataframe df
as follows:
id room area check
0 1 A-102 world NaN
1 2 NaN 24 room name is not valid
2 3 B309 NaN NaN
3 4 C·102 25 room name is not valid
4 5 E_1089 hello room name is not valid
5 6 27 NaN NaN
6 7 27 NaN NaN
I want to check whether area
columns is valid data format, if it's either numbers
or NaN
s, then consider it's as valid data, eitherwise, update check
column with area is not a number
.
I tried with df.loc[df.area.str.contains('^\d+$', na = True), 'check'] = 'area is not a number'
, but not get what I needed.
How could I get an expected result like this:
id room area check
0 1 A-102 world area is not a number
1 2 NaN 24 room name is not valid
2 3 B309 NaN NaN
3 4 C·102 25 room name is not valid
4 5 E_1089 hello room name is not valid; area is not a number
5 6 27 NaN NaN
6 7 27 NaN NaN
Thanks for your help at advance.
Upvotes: 2
Views: 74
Reputation: 862396
You are close, only invert mask by ~
:
df.loc[~df.area.str.contains('^\d+$', na = True), 'check'] = 'area is not a number'
print(df)
id room area check
0 1 A-102 world area is not a number
1 2 NaN 24 NaN
2 3 B309 NaN NaN
3 4 C·102 25 NaN
4 5 E_1089 hello area is not a number
5 6 27 NaN NaN
6 7 27 NaN NaN
Or use Series.where
:
df['check'] = df['check'].where(df.area.str.contains('^\d+$', na = True),
'area is not a number')
EDIT:
m1 = df.room.str.contains('([^a-zA-Z\d\-])', na = True)
m2 = df.area.str.contains('^\d+$', na = True)
v1 = 'room name is not valid'
v2 = 'area is not a number'
df['check'] = np.where(m1 & ~m2, v1 + ', ' + v2,
np.where(m1, v1,
np.where(~m2, v2, None)))
print(df)
id room area check
0 1 A-102 world area is not a number
1 2 NaN 24 room name is not valid
2 3 B309 NaN None
3 4 C 102 25 room name is not valid
4 5 E_1089 hello room name is not valid, area is not a number
5 6 27 NaN None
6 7 27 NaN None
Upvotes: 1