Reputation: 19
I have been experimenting with Python's Regex Module: Re.
I decided to write a simple expression that searches for links (href="url"
) in a file.
Here is my Regex: href *= *(\"|\').*\1
When I used a site called GSkinner, I decided to try out my expression. The results are here, along with the code.
When I decided to try it out on python regex, I used the following code:
lines = """Code found in link"""
results = re.findall(r"href *= *(\"|\').*\1", lines)
print results # Ouputs: ['"', '"'] instead of two provided links
Why are the results outputting in empty strings?
Upvotes: 1
Views: 131
Reputation: 191729
findall
will only return what is captured (unless nothing is captured). You have to capture the value you want as well:
r"href *= *(\"|\')(.*?)\1
All together you may want to use something like:
results = [x[1] for x in re.findall(r"href *= *(\"|\')(.*?)\1", lines)]
Upvotes: 1