CP12
CP12

Reputation: 1

Ruby: Search file text for a pattern and replace 'part of' it with a given value?

This is a follow-up question to this post.

I am new to Ruby and want to create a script that will search a file for a pattern. However, I want to only replace part of it, i.e. remove all http:// patterns matches but only when they are followed by a valid url.

Upvotes: 0

Views: 251

Answers (1)

slushie
slushie

Reputation: 480

If "valid url" means that the string is parseable as an URL, then you might try using URI.parse. For example:

require 'uri'
IO.readlines(input_file).each do |line|
  line.gsub(%r;(https?://\S+);) do |url|
    URI.parse(url) && '' rescue url
  end
end

However, the URI module is very lax. You'll find strings like not-an-uri are considered valid "generic" URIs.

You might want to check whether the captured URL can be fetched and returns a successful HTTP status. That is significantly more resource intensive, so operating over a large input file would be very slow. It also could be considered a security risk.

require 'uri'
require 'net/http'

def valid_url?(url)
  uri = URI.parse(url)
  Net::HTTP.get_response(uri).is_a? Net::HTTPSuccess
rescue
 return false
end

IO.readlines(input_file).each do |line|
  line.gsub(%r;(https?://\S+);) do |url|
    valid_url?(url) ? '' : url
  end
end

Upvotes: 1

Related Questions