Reputation: 587
I would like to ask for help with my regex. I need to extract the very last part from each URL. I marked it as 'to_extract' within the example below.
I want to know what's wrong with the following regex when used with sed:
sed 's/^[ht|f]tp.*\///' file.txt
Sample content of file.txt:
http://a/b/c/to_extract
ftp://a/b/c/to_extract
...
I am getting only correct results for the ftp links, not for the http. Thanks in advance for your explanation on this. i.
Upvotes: 1
Views: 61
Reputation: 91
How about use "basename" :
basename http://a/b/c/to_extract
to_extract
you can simply achieve what you want with a for loop.
#!/bin/bash
myarr=( $(cat ooo) )
for i in ${myarr[@]}; do
basename $i
done
Upvotes: 1
Reputation: 382092
Change [ht|f]
to (ht|f)
, that would give better results.
[abc]
means "one character which is a
, b
or c
".
[ht|f]
means "one character which is h
, t
, |
or f
", not at all what you want.
On some versions of sed, you'll have to call it with the -r
option so that extended regex can be used :
sed -r 's/^(ht|f)tp.*\///' file.txt
If you just want to extract the last part of the url and don't want anything else, you probably want
sed -rn 's/^(ht|f)tp.*\///p' file.txt
Upvotes: 8