luisfer
luisfer

Reputation: 2120

Pandas: Filter rows by regex condition

I've read several questions and answers to this, but I must be doing something wrong. I'd appreciate if someone points at me what it might be.

In my df dataframe I have my first column that should always contain six digits, I'm loading the dataframe from Excel, and some smart user thought it would be too funny if adding a disclaimer in the first column.

So I have in the first column something like:

['123456', '456789', '147852', 'In compliance with...']

So I need to filter only the valid records I'm tryng:

pat='\d{6}'
filter = df[0].str.contains(pat, regex=True)

This thing returns 'False' for the disclaimer, but NaN for the match, so doing a df[filter] yields nothing

What am I doing wrong?

Upvotes: 0

Views: 3977

Answers (1)

BoomBoxBoy
BoomBoxBoy

Reputation: 1885

You should be able to do that with the following.

You need to select the rows based on the regex filter.

Note that the current regex that you are using will match anything above 6 digits as well. I changed this to include 6 digits exactly.

df = df[df.columns[0]].str.contains('^[0-9]{6}$', regex=True)

Upvotes: 3

Related Questions