Reputation: 1811
I'm cleaning some text data and I'm not able to locate rows containing certain strings. If I do a simple boolean, I get:
'<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^' in articles.loc[25111, 'content']
True
But if I select rows with that exact same string, I get an empty dataframe:
articles[articles['content'].str.contains('<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^')]
id title author date content year month publication category digital section url stems
Why would this happen?
Upvotes: 3
Views: 1642
Reputation: 862661
I think some values are read as regex, so need parameter regex=False
in str.contains
.
s = '<! [CDATA[! function( d,s, id){varjs, fjs=d. getElementsByTagName( s)[0],p= ^'
articles[articles['content'].str.contains(s, regex=False)]
Upvotes: 7