Asaf Magen
Asaf Magen

Reputation: 1104

how to edit url string with sed

My Linux repository file contain a link that until now was using http with a port number to point to it repository.

baseurl=http://host.domain.com:123/folder1/folder2

I now need a way to replace that URL to use https with no port or a different port . I need also the possibility to change the server name for example from host.domain.com to host2.domain.com

So my idea was to use sed to search for the start of the http until the first / that come after the 2 // thus catching whatever in between and will give me the ability to change both server name port or http\s usage.

Im now using this code (im using echo just for the example):

the example shows how in 2 cases where one time i have a link with http and port 123 converted to https and the second time the other way around and both code i was using the same sed for generic reasons.

WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"

OR

WANTED_URL="http://host.domain.com:123"
echo 'https://host.domain.com/folder1/folder2' | sed -i    "s|http.*://[^/]*|$WANTED_URL|"

is that the correct way doing so?

Upvotes: 0

Views: 2866

Answers (2)

Ed Morton
Ed Morton

Reputation: 203493

Assuming it doesn't really matter if you have 1 sed script or 2 and there isn't a good reason to hard-code the URLs:

$ echo 'http://host.domain.com:123/folder1/folder2' |
    sed 's|\(:[^:]*\)[^/]*|s\1|'
https://host.domain.com/folder1/folder2

$ port='123'; echo 'https://host.domain.com/folder1/folder2' |
    sed 's|s\(://[^/]*\)|\1:'"$port"'|'
http://host.domain.com:123/folder1/folder2

If that isn't what you need then edit your question to clarify your requirements and in particular explain why:

  1. You want to use hard-coded URLs, and
  2. You need 1 script to do both transformations.

and provide concise, testable sample input and expected output that demonstrates those needs (i.e. cases where the above doesn't work).

wrt what you had:

WANTED_URL="https://host.domain.com"
echo 'http://host.domain.com:123/folder1/folder2' | sed -i "s|http.*://[^/]*|$WANTED_URL|"

The main issues are:

  1. Don't use all-upper-case for non-exported shell variable names to avoid clashes with exported variables and to avoid obfuscating your code (this convention has been around for 40 years so people expect all upper case variables to be exported).
  2. Never enclose any script in double quotes as it exposes the whole script to the shell for interpretation before the command you want to execute even sees it. Instead just open up the single quotes around the smallest script segment possible when necessary, i.e. to expand $y in a script use cmd 'x'"$y"'z' not cmd "x${y}z" because the latter will fail cryptically and dangerously given various input, script text, environment settings and/or the contents of the directory you run it from.
  3. The -i option for sed is to edit a file in-place so you can't use it on an incoming pipe because you can't edit a pipe in-place.
  4. When you let a shell variable expand to become part of a script, you have to take care about the possible characters it contains and how they'll be interpreted by the command given the context the variable expands into. If you let a whole URL expand into the replacement section of a sed script then you have to be careful to first escape any potential backreference characters or script delimiters. See Is it possible to escape regex metacharacters reliably with sed. If you just let the port number expand then you don't have to deal with any of that.

Upvotes: 1

Jean-François Fabre
Jean-François Fabre

Reputation: 140168

sed regexes are greedy by default. You can tell sed to consume only non-slashes, like this:

 echo  'http://host.domain.com:123/folder1/folder2' | sed -e 's|http://[^/]*|https://host.domain.com|'

result:

https://host.domain.com/folder1/folder2

(BTW you don't have to escape slashes because you are using an alternate separating character)

the key is using [^/]* which will match anything but slashes so it stops matching at the first slash (non-greedy).

You used /.*/ and .* can contain slashes, not that you wanted (greedy by default).

Anyway my approach is different because expression does not include the trailing slash so it is not removed from final output.

Upvotes: 2

Related Questions