Reputation: 471
I have the following substring in the string str(dList):
"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>
I am trying to use re.search to pull out "MA" using this:
state = re.search(r'"addressRegion">\n\t\t\t\t\t\t\t\t\t(.+?)\n\t',str(dList))
however, that doesn't seem to work. I understand this is possibly because of the the way "/" is handled. I can't figure out how to deal with this.
Upvotes: 0
Views: 127
Reputation: 191844
Regex is really not necessary
In [22]: str = '<span class="addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'
In [23]: from bs4 import BeautifulSoup
In [24]: soup = BeautifulSoup(str, 'html.parser')
In [25]: soup.text
Out[25]: u'\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t'
In [26]: soup.text.strip()
Out[26]: u'MA'
Upvotes: 2
Reputation: 8147
update This is how you could do it if you really wanted to use regex, but I think @cricket_007's solution is the better approach.
All you need to do is to escape the backslash with another backslash. You can also get rid of the repetitions of '\t':
>>> s = '"addressRegion">\n\t\t\t\t\t\t\t\t\tMA\n\t\t\t\t\t\t\t\t</span>'
>>> re.search('.*\\n(\\t)+(.*?)\\n(\\t)+.*',s).group(2)
'MA'
Upvotes: 1