user68650
user68650

Reputation: 115

Converting Regex to Sed

I have the following regex.

/http:\/\/([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+/g

Which identifies matching URL's (https://regex101.com/r/sG9zR7/1). I need to modify it in order to be able to use it on the command line so it prints out the results. so I modified it to following

sed -n 's/.*\(http:\/\/\([a-zA-Z0-9\-]+\.\)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+\).*/\1/p' filename 

(I was trying to add bold to the characters added but could not) there were as follows

sed -n 's/.*( (in the beginning )

\ (For the inner parenthesis)

).*/\1/p' filename (at the end)

However, i get no results when i execute it.

Upvotes: 2

Views: 1168

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89574

You can achieve the same with an xpath query via xidel:

xidel file.html -e '//a/@href[fn:matches(.,"http://[^/]*:")]/fn:substring-after(.,"=")'

Upvotes: 0

user68650
user68650

Reputation: 115

sed -rn 's~.*(http://([a-z0-9\-]+.)*[a-z0-9\-]+:[0-9]+\/[a-z0-9]+.[a-z]+).*~\1~ip' Filename is the working command. With the assistance of the sample supplied (thank you hjpotler92) I was able to figure out the escape character did not need to be applies to certain characters. Will have to find out when and how it is applied when using the -r option.

Upvotes: 0

hjpotter92
hjpotter92

Reputation: 80649

Make it a habit to use a delimiter other that / when dealing with URLs. It makes the pattern easier to read.

sed -r -n 's~.*\(http://\([a-z0-9\-]+\.\)+[a-z0-9\-]+:[a-z0-9\-]+/[a-z]+\.[a-z]+\).*~\1~ip' file

Note that I use i modifier for ignorecase.

As hwnd comments, you should put -r flag to sed command as well since your pattern requires + to be treated in a special manner.

Upvotes: 1

Related Questions