gk7
gk7

Reputation: 155

Pattern matching with regex returns None while it should not

I am learning regex and Beautiful Soup and I am doing the Google Tutorial on Regex. I am using the HTML files provided in the Google Tutorial website (exercise set in the set up section of the tutorial)

The code is the following:

with open(filepath,"r") as f: soup = bs(f, 'lxml')
soup.title

out

<title>Popular Baby Names</title>

code:

h3 = soup.find_all("h3") # With find_all() I will capture the content of the <h3> Tags (In fact only one h3 Tag exists
                         # containing the Year)

h3[0].get_text() 

out

u'Popularity in 1990'

code:

pattern = re.compile(r'.+(\d\d\d\d).+') 
string = h3[0].get_text()
pattern.match(string).group(0)

out

AttributeError                            Traceback (most recent call last)
<ipython-input-61-2e4daef3292c> in <module>()
----> 1 pattern.match(string).group(0)

AttributeError: 'NoneType' object has no attribute 'group'

I can not explain why match() does not capture the year as it should.

Your advice will be appreciated.

Upvotes: 0

Views: 109

Answers (1)

palako
palako

Reputation: 3470

Because it expects at least one character after the year. Try .* instead of .+

Upvotes: 1

Related Questions