Reputation: 587

Regular expression help - what's wrong?

I would like to ask for help with my regex. I need to extract the very last part from each URL. I marked it as 'to_extract' within the example below.

I want to know what's wrong with the following regex when used with sed:

sed 's/^[ht|f]tp.*\///' file.txt

Sample content of file.txt:

http://a/b/c/to_extract
ftp://a/b/c/to_extract
...

I am getting only correct results for the ftp links, not for the http. Thanks in advance for your explanation on this. i.

Upvotes: 1

Answers (2)

Reputation: 91

How about use "basename" :

basename http://a/b/c/to_extract    
to_extract

you can simply achieve what you want with a for loop.

#!/bin/bash

myarr=( $(cat ooo) )

for i in ${myarr[@]}; do

basename $i

done

Upvotes: 1

Reputation: 382092

Change [ht|f] to (ht|f), that would give better results.

[abc] means "one character which is a, b or c".

[ht|f] means "one character which is h, t, | or f", not at all what you want.

On some versions of sed, you'll have to call it with the -r option so that extended regex can be used :

sed -r 's/^(ht|f)tp.*\///' file.txt

If you just want to extract the last part of the url and don't want anything else, you probably want

sed -rn 's/^(ht|f)tp.*\///p' file.txt

Upvotes: 8