Reputation: 482
I am searching quite a while for a regex compatible with Python's re
module for finding all URLs in HTML document and I cannot find it except one that was only to able to check whether an url is valid or invalid (with match
method). I want to do simple
import requests
html_response = requests.get('http://example.com').text
urls = url_pattern.findall(html_response)
I suppose needed regex (if exists) would be complex enough to take into consideration a bunch of special cases of urls so it cannot be some oneline code.
Upvotes: 1
Views: 292
Reputation: 495
Use BeautifulSoup instead.It's simple to use and allows you to parse pages with HTML.
See this answer How to extract URLs from an HTML page in Python
Upvotes: 4