Reputation: 137
I have a data frame, I am using regex to check the pattern of the data of a column while doing this there are nulls in it. Due to nulls, it was able to match. I don't want to drop them either replace it with some other value. I want to ignore it, though I tried getting errors or getting NONE as output. How do we ignore the null values while doing a match?
code:
df =
a b c
0 rt-0000 abc 1
1 vb 2
2 rt-1234 abc 3
3 op 4
4 rt-123 oip 5
format = 'rt-\d\d\d\d'
if df['a'].isnull().any():
continue
correct_df = df[df[key].str.match(format )]
wrong_df = df[~df[key].str.match(format )]
The output gives: NONE
when I tried without ignoring nulls I got a error: 'Cannot mask Naan/Null values'
excepted output:
corrected_df:
a b c
0 rt-0000 abc 1
1 vb 2
2 rt-1234 abc 3
3 op 4
wrong_df:
4 rt-123 oip 5
I tried using different if condition but I end up with the same output. Can we ignore the null values?
Upvotes: 1
Views: 1053
Reputation: 1088
For:
df = pd.DataFrame({'a':['rt-0000',np.nan,'rt-1234',np.nan,'rt-123'],
'b':['abc','vb','abc','op','oip'],
'c':[1,2,3,4,5]})
a b c
0 rt-0000 abc 1
1 NaN vb 2
2 rt-1234 abc 3
3 NaN op 4
4 rt-123 oip 5
You can simply use:
correct_df = df[df.a.str.match(format, na=True)]
wrong_df = df[~df.a.str.match(format, na=True)]
That gives your result:
a b c
0 rt-0000 abc 1
1 NaN vb 2
2 rt-1234 abc 3
3 NaN op 4
and
a b c
4 rt-123 oip 5
Upvotes: 1