ah bon
ah bon

Reputation: 10011

Check whether one column's data type is number or NaN in Pandas

Given a dataframe df as follows:

   id    room   area                    check
0   1   A-102  world                      NaN
1   2     NaN     24   room name is not valid
2   3    B309    NaN                      NaN
3   4   C·102     25   room name is not valid
4   5  E_1089  hello   room name is not valid
5   6      27    NaN                      NaN
6   7      27    NaN                      NaN

I want to check whether area columns is valid data format, if it's either numbers or NaNs, then consider it's as valid data, eitherwise, update check column with area is not a number.

I tried with df.loc[df.area.str.contains('^\d+$', na = True), 'check'] = 'area is not a number', but not get what I needed.

How could I get an expected result like this:

   id    room   area                                          check
0   1   A-102  world                           area is not a number
1   2     NaN     24                         room name is not valid
2   3    B309    NaN                                            NaN
3   4   C·102     25                         room name is not valid
4   5  E_1089  hello   room name is not valid; area is not a number
5   6      27    NaN                                            NaN
6   7      27    NaN                                            NaN

Thanks for your help at advance.

Upvotes: 2

Views: 74

Answers (1)

jezrael
jezrael

Reputation: 862396

You are close, only invert mask by ~:

df.loc[~df.area.str.contains('^\d+$', na = True), 'check'] = 'area is not a number' 

print(df)
   id    room   area                 check
0   1   A-102  world  area is not a number
1   2     NaN     24                   NaN
2   3    B309    NaN                   NaN
3   4   C·102     25                   NaN
4   5  E_1089  hello  area is not a number
5   6      27    NaN                   NaN
6   7      27    NaN                   NaN

Or use Series.where:

df['check'] = df['check'].where(df.area.str.contains('^\d+$', na = True),
                                'area is not a number') 

EDIT:

m1 = df.room.str.contains('([^a-zA-Z\d\-])', na = True)
m2 = df.area.str.contains('^\d+$', na = True)
v1 = 'room name is not valid'
v2 = 'area is not a number'


df['check'] = np.where(m1 & ~m2, v1 + ', ' + v2, 
              np.where(m1, v1,
              np.where(~m2, v2, None)))

print(df)
   id    room   area                                         check
0   1   A-102  world                          area is not a number
1   2     NaN     24                        room name is not valid
2   3    B309    NaN                                          None
3   4   C 102     25                        room name is not valid
4   5  E_1089  hello  room name is not valid, area is not a number
5   6      27    NaN                                          None
6   7      27    NaN                                          None

Upvotes: 1

Related Questions