snapcrack
snapcrack

Reputation: 1811

'str.contains' not returning values in dataframe

I'm cleaning some text data and I'm not able to locate rows containing certain strings. If I do a simple boolean, I get:

'<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^' in articles.loc[25111, 'content']

True

But if I select rows with that exact same string, I get an empty dataframe:

articles[articles['content'].str.contains('<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^')]

id  title   author  date    content year    month   publication category    digital section url stems

Why would this happen?

Upvotes: 3

Views: 1642

Answers (1)

jezrael
jezrael

Reputation: 862661

I think some values are read as regex, so need parameter regex=False in str.contains.

s = '<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^'
articles[articles['content'].str.contains(s, regex=False)]

Upvotes: 7

Related Questions