Reputation: 1
This is a follow-up question to this post.
I am new to Ruby and want to create a script that will search a file for a pattern. However, I want to only replace part of it, i.e. remove all http://
patterns matches but only when they are followed by a valid url.
Upvotes: 0
Views: 251
Reputation: 480
If "valid url" means that the string is parseable as an URL, then you might try using URI.parse
. For example:
require 'uri'
IO.readlines(input_file).each do |line|
line.gsub(%r;(https?://\S+);) do |url|
URI.parse(url) && '' rescue url
end
end
However, the URI
module is very lax. You'll find strings like not-an-uri
are considered valid "generic" URIs.
You might want to check whether the captured URL can be fetched and returns a successful HTTP status. That is significantly more resource intensive, so operating over a large input file would be very slow. It also could be considered a security risk.
require 'uri'
require 'net/http'
def valid_url?(url)
uri = URI.parse(url)
Net::HTTP.get_response(uri).is_a? Net::HTTPSuccess
rescue
return false
end
IO.readlines(input_file).each do |line|
line.gsub(%r;(https?://\S+);) do |url|
valid_url?(url) ? '' : url
end
end
Upvotes: 1