Reputation: 647
I have such a regular expression:
re.compile(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", re.MULTILINE|re.UNICODE)
But that doesn't include hashbangs (#!)
. What do I need to change to get it working? I know I can add !
to a group with #@%
, etc., but that will select something like
Check this out: http://example.com/something/!!!
And I want to avoid that.
Upvotes: 15
Views: 106021
Reputation: 41
I use this to search for all HTTP and HTTPS URLs. It works like a charm.
URL_PATTERN = "http[s]*\S+"
Upvotes: 2
Reputation: 27
This is the most complete pattern I use:
URL_PATTERN = r'[A-Za-z0-9]+://[A-Za-z0-9%-_]+(/[A-Za-z0-9%-_])*(#|\\?)[A-Za-z0-9%-_&=]*'
Upvotes: 1
Reputation: 3042
It could be very long but in practice mine works pretty good. Please try this one
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*
It matches all of the example below
http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
[email protected]
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:[email protected]/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/[email protected]
[email protected].
[email protected]
[email protected]
Upvotes: 11
Reputation: 977
Based on this link, we can use the library validators.
For example:
import validators
valid = validators.url('https://codespeedy.com/')
if valid == True:
print("URL is valid")
else:
print("Invalid URL")
Upvotes: 2
Reputation: 26487
This is a common problem. Use default libraries.
For Python, use urlparse.
Upvotes: 6
Reputation: 3658
I'll admit that I'm a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:
((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)
Upvotes: 1