Reputation: 111109
A tutorial I have on Regex in python explains how to use the re module in python, I wanted to grab the URL out of an A tag so knowing Regex I wrote the correct expression and tested it in my regex testing app of choice and ensured it worked. When placed into python it failed:
result = re.match("a_regex_of_pure_awesomeness", "a string containing the awesomeness")
# result is None`
After much head scratching I found out the issue, it automatically expects your pattern to be at the start of the string. I have found a fix but I would like to know how to change:
regex = ".*(a_regex_of_pure_awesomeness)"
into
regex = "a_regex_of_pure_awesomeness"
Okay, it's a standard URL regex but I wanted to avoid any potential confusion about what I wanted to get rid of and possibly pretend to be funny.
Upvotes: 7
Views: 1127
Reputation:
Are you using the re.match()
or re.search()
method? My understanding is that re.match()
assumes a "^
" at the beginning of your expression and will only search at the beginning of the text, while re.search()
acts more like the Perl regular expressions and will only match the beginning of the text if you include a "^
" at the beginning of your expression. Hope that helps.
Upvotes: 1
Reputation: 414905
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(your_html)
for a in soup.findAll('a', href=True):
# do something with `a` w/ href attribute
print a['href']
Upvotes: 4
Reputation: 123030
>>> import re
>>> pattern = re.compile("url")
>>> string = " url"
>>> pattern.match(string)
>>> pattern.search(string)
<_sre.SRE_Match object at 0xb7f7a6e8>
Upvotes: 3
Reputation: 14779
In Python, there's a distinction between "match" and "search"; match only looks for the pattern at the start of the string, and search looks for the pattern starting at any location within the string.
Python regex docs
Matching vs searching
Upvotes: 20