krthkskmr
krthkskmr

Reputation: 471

Dealing with "\n\t\t" with regex

I have the following substring in the string str(dList):

"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>

I am trying to use re.search to pull out "MA" using this:

state = re.search(r'"addressRegion">\n\t\t\t\t\t\t\t\t\t(.+?)\n\t',str(dList))

however, that doesn't seem to work. I understand this is possibly because of the the way "/" is handled. I can't figure out how to deal with this.

Upvotes: 0

Views: 127

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191844

Regex is really not necessary

In [22]: str = '<span class="addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'

In [23]: from bs4 import BeautifulSoup

In [24]: soup = BeautifulSoup(str, 'html.parser')

In [25]: soup.text
Out[25]: u'\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t'

In [26]: soup.text.strip()
Out[26]: u'MA'

Upvotes: 2

yurib
yurib

Reputation: 8147

update This is how you could do it if you really wanted to use regex, but I think @cricket_007's solution is the better approach.

All you need to do is to escape the backslash with another backslash. You can also get rid of the repetitions of '\t':

>>> s = '"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'
>>> re.search('.*\\n(\\t)+(.*?)\\n(\\t)+.*',s).group(2)
'MA'

Upvotes: 1

Related Questions