Removing entries from Pandas DF beginning with letter and two numbers

Question

I am curious as to how to remove string entries from a Pandas DF beginning with a letter and two numbers and replacing with NaN.

A        B         C          D
Apple    Pear      N45 82f    John 
Cat      P48 hH2   Mary       Sponge 
Hat      P67 De1   Bed        S90 GGGF

I would like to replace all entries across the DF beginning with a letter and two numbers with NaN.

I have tried something along the lines of

for columns in df.columns[1:]:
    for i in columns: 
        if i[0].isalpha() and i[1].isdigit and i.[2].isdigit():
            i.replace(i,None)

Unfortunately this not seem to function. Any help would be appreciated.

Scott Boston · Accepted Answer

You can try this:

df.mask(df.apply(lambda r: r.str.contains('[a-zA-Z]{1}\d{2}')))

Output:

       A     B     C       D
0  Apple  Pear   NaN    John
1    Cat   NaN  Mary  Sponge
2    Hat   NaN   Bed     NaN

I like @coldspeed's stack too:

df[~df.stack().str.contains('[a-zA-Z]{1}\d{2}').unstack()]

Output:

       A     B     C       D
0  Apple  Pear   NaN    John
1    Cat   NaN  Mary  Sponge
2    Hat   NaN   Bed     NaN

Answers (2)