bragboy
bragboy

Reputation: 35572

Replace all words before the start of the first word (Regex and Ruby)

Here are my test cases.

Expected:

JUNKINFRONThttp://francium.tech should be http://francium.tech JUNKINFRONThttp://francium.tech/http should be http://francium.tech/http francium.tech/http should be francium.tech/http (unaffected)

Actual result:

http://francium.tech
francium.tech/http
http

I am trying to write a regex replace for this. I tried this,

text.sub(/.*http/,'http')

However, my second and third test cases fail because it searches till the end. It would help if the answer could also do the case insensitivity.

2.5.0 :001 > url = 'francium.tech/http'
 => "francium.tech/http" 
2.5.0 :002 > url.sub(/^.*?(?=http)/i,'')
 => "http" 

Upvotes: 0

Views: 224

Answers (3)

ctwheels
ctwheels

Reputation: 22837

As per my original comments, you can use the pattern as shown below. If you want a really small performance gain, you can remove one step in the regex by using the second pattern instead. If you're especially concerned with performance, the last one performs even quicker.

^.*?(?=https?://)
^.*?(?=https?:/{2})
^.*?(?=ht{2}ps?:/{2})

See code in use here

strings = [
    "JUNKINFRONThttp://francium.tech",
    "JUNKINFRONThttp://francium.tech/http",
    "francium.tech/http"
]
strings.each { |s| puts s.sub(%r{^.*?(?=https?://)}, '') }

Outputs the following:

http://francium.tech
http://francium.tech/http
francium.tech/http

Upvotes: 2

Night Train
Night Train

Reputation: 2586

When using regex you should make sure to use unique strings like http:\\ or better http:\\[SOMETHING].[AT_LEAST_TWO_CHARS][MAYBE_A_SLASH] and so on...

This works for your given cases:

str = ['JUNKINFRONThttp://francium.tech',
    'JUNKINFRONThttp://francium.tech/http',
    'francium.tech/http']

str.each do |str|
    puts str.sub(/^.*?(https?:\/{2})/, '\1')  # with capturing group
    puts str.sub(/^.*?(?=https?:\/{2})/, '')  # with positive lookahead
end

By using a group we can use it for the replacement, another method would be to use a positive lookahead

Upvotes: 2

Ponnusamy K
Ponnusamy K

Reputation: 134

I think this may solve your problem.

str1 = 'JUNKINFRONThttp://francium.tech'# should be http://francium.tech 
str2 = 'JUNKINFRONThttp://francium.tech/http'# should be http://francium.tech/http
str3 = 'francium.tech/http' #should be francium.tech/http (unaffected)
str4 = 'JUNKINFRONThttps://francium.tech/http'# should be https://francium.tech/http

[str1, str2, str3, str4].each do |str|
  puts str.gsub(/^.*(http|https):\/\//i, "\\1://")
end

Result:
http://francium.tech
http://francium.tech/http
francium.tech/http
https://francium.tech/http

Upvotes: 2

Related Questions