Reputation: 1534
I'm trying to make a regular expression that checks if some text only contains urls and whitespaces and nothing else so:
http://www.google.com http://www.stackoverflow.com
would match, but:
http://www.google.com and http://www.stackoverflow.com
would not match.
Is this possible?
Upvotes: 3
Views: 221
Reputation: 13544
This will check for any URL and the string should be URLs with single white-space as URLs separator only
Look at this live demo
(((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)\s){1,}((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)$
Reference:
Upvotes: 0
Reputation: 160551
Ruby already has a method to extract URLs, so that's a great starting place, rather than reinventing a working wheel:
require 'uri'
[
'http://www.google.com http://www.stackoverflow.com',
'http://www.google.com and http://www.stackoverflow.com'
].each do |url|
print url
if url.split.all? { |u| !URI.extract(u).empty? }
puts " contains only URLs"
else
puts " doesn't contain only URLs"
end
end
Which, after running, is:
http://www.google.com http://www.stackoverflow.com contains only URLs http://www.google.com and http://www.stackoverflow.com doesn't contain only URLs
This doesn't support all the recognized URL schemes, but it is a starting point. You can specify which you want by passing an array of schemes to extract
. You can get the IANA's permanent list using:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.iana.org/assignments/uri-schemes.html'))
schemes = doc.at('table table').search('tr').map{ |tr| tr.at('td').text }[1..-1]
Upvotes: 1
Reputation: 89557
you can use this regex (only test if that is between spaces begin with http://):
/^(?:https?:\/\/\S++\s*+)++$/ =~ text
Upvotes: 1
Reputation: 3625
If you really want to use regex, please try this:
(?< protocol>\w+):\/\/(?< domain>[\w@][\w.:@]+)\/?[\w\.?=%&=\-@/$,]*
Please remove the space before 'protocol' and 'domain'.
Split the string with the whitespaces, and check each string if it is match with the regex above.
Hope it helps!
Upvotes: 0