regex not recoginizing matches as True

Question

I have a dataframe with text data and I'm trying to clean out rows with empty content values. I have one row whose content column looks like this:

articles.loc[197040, 'content']
'     '

I've tried cleaning it up with .isnull(), but that doesn't recognize empty strings. So I resorted to regex and tried:

nothing = re.compile(r'\W{1,}')
articles = articles[articles['content'] != nothing]

But this leaves the empty articles in. If I try:

'     ' == nothing

I get False. But the regex tester seems to indicate that that should work. Using r'\W*' also returns False.

The problem persists with other meaningless strings---e.g., a mix of commas and whitespace---when other regex combinations are tried.

Thanks for any help.

Edit:

It's also not recognizing equivalence here:

'what.' == re.compile(r'\w*\.')
False

Or here:

'6:45' == r'[^A-Z]{1,}'
False

And so on and so forth.

bogdanciobanu · Accepted Answer

To check if a regex matches a string you have to use the match method, not to check for equality. You're basically comparing a string with a pattern object which, of course, are not equal. Try this:

nothing.match('    ') # out: <_sre.SRE_Match object; span=(0, 4), match='    '>
x.match(' , , ,') # out: <_sre.SRE_Match object; span=(0, 6), match=' , , ,'>

regex not recoginizing matches as True

Edit:

Answers (2)

Related Questions