Toi
Toi

Reputation: 137

How to ignore nulls while doing matching in python?

I have a data frame, I am using regex to check the pattern of the data of a column while doing this there are nulls in it. Due to nulls, it was able to match. I don't want to drop them either replace it with some other value. I want to ignore it, though I tried getting errors or getting NONE as output. How do we ignore the null values while doing a match?

code:

df =
  a        b    c
0 rt-0000  abc  1
1          vb   2
2 rt-1234  abc  3
3          op   4
4 rt-123   oip  5

format = 'rt-\d\d\d\d'
if df['a'].isnull().any():
          continue
          correct_df = df[df[key].str.match(format )]
          wrong_df = df[~df[key].str.match(format )]

The output gives: NONE

when I tried without ignoring nulls I got a error: 'Cannot mask Naan/Null values'

excepted output:

corrected_df:
      a        b    c
    0 rt-0000  abc  1
    1          vb   2
    2 rt-1234  abc  3
    3          op   4
wrong_df:
4 rt-123   oip  5

I tried using different if condition but I end up with the same output. Can we ignore the null values?

Upvotes: 1

Views: 1053

Answers (1)

Wiliam
Wiliam

Reputation: 1088

For:

df = pd.DataFrame({'a':['rt-0000',np.nan,'rt-1234',np.nan,'rt-123'],
                  'b':['abc','vb','abc','op','oip'],
                  'c':[1,2,3,4,5]})

         a    b  c
0  rt-0000  abc  1
1      NaN   vb  2
2  rt-1234  abc  3
3      NaN   op  4
4   rt-123  oip  5

You can simply use:

correct_df = df[df.a.str.match(format, na=True)]
wrong_df = df[~df.a.str.match(format, na=True)]

That gives your result:

         a    b  c
0  rt-0000  abc  1
1      NaN   vb  2
2  rt-1234  abc  3
3      NaN   op  4

and

        a    b  c
4  rt-123  oip  5

Upvotes: 1

Related Questions