Reputation: 21
I need to process a html page and identify the hyper links present in the page. I am successful if the code is like this
<script type="text/javascript" src="/test/test.html">
I used a simple regex to identify the data which is between double quotes and that starts with /
and I got all the liks which are of this type.
But I am not able to understand how to get the links if the script is like
<script type="text/javascript" src="test/test.html">
because I canot use the same old regex or if I try to use the regex gets data which is in double quotes then I will get "text/javascript"
also in the output which is not required. Can I use seek() to do this ?
Thanks.
Upvotes: 0
Views: 477
Reputation: 3061
Try using:
regex = re.compile('src="([^"]*)"')
result = regex.match(html)
print result.match(1)
Upvotes: 1