Reputation: 155
I am learning regex and Beautiful Soup and I am doing the Google Tutorial on Regex. I am using the HTML files provided in the Google Tutorial website (exercise set in the set up section of the tutorial)
The code is the following:
with open(filepath,"r") as f: soup = bs(f, 'lxml')
soup.title
out
<title>Popular Baby Names</title>
code:
h3 = soup.find_all("h3") # With find_all() I will capture the content of the <h3> Tags (In fact only one h3 Tag exists
# containing the Year)
h3[0].get_text()
out
u'Popularity in 1990'
code:
pattern = re.compile(r'.+(\d\d\d\d).+')
string = h3[0].get_text()
pattern.match(string).group(0)
out
AttributeError Traceback (most recent call last)
<ipython-input-61-2e4daef3292c> in <module>()
----> 1 pattern.match(string).group(0)
AttributeError: 'NoneType' object has no attribute 'group'
I can not explain why match() does not capture the year as it should.
Your advice will be appreciated.
Upvotes: 0
Views: 109
Reputation: 3470
Because it expects at least one character after the year. Try .* instead of .+
Upvotes: 1