Karl Janssoone
Karl Janssoone

Reputation: 37

Parser a href tag in a website with bash shell

i have a website with one url inside. it's a href tag

I need to parser a website to keep the "href" value.

In this website page, there is juste one "href" tag. This "href" hasn't class name.

i use a bash shell with curl

for now, i tried this :

curl http://MyWebsite | grep "href=" | cut -d '>' -f4 | cut -d '<' -f1

but no result. i'm novice with bash shell

Someone have an idea ? Thank's for your answers

Upvotes: 0

Views: 444

Answers (2)

Juanan
Juanan

Reputation: 1420

I know that there is only a single href, but just in case... you can also extract URLs from all anchors inside an HTML document with sed and grep:

curl -s http://MyWebsite  | grep -o '<a .*href=.*>' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'

Upvotes: 0

ralz
ralz

Reputation: 543

If you want to keep the href= part

curl -s http://MyWebsite | grep -E -io 'href="[^\"]+"'

If you only want URL without the href=

curl -s http://MyWebsite | grep -E -io 'href="[^\"]+"' | awk -F\" '{print$2}'

Upvotes: 1

Related Questions