Reputation: 37
i have a website with one url inside. it's a href tag
I need to parser a website to keep the "href" value.
In this website page, there is juste one "href" tag. This "href" hasn't class name.
i use a bash shell with curl
for now, i tried this :
curl http://MyWebsite | grep "href=" | cut -d '>' -f4 | cut -d '<' -f1
but no result. i'm novice with bash shell
Someone have an idea ? Thank's for your answers
Upvotes: 0
Views: 444
Reputation: 1420
I know that there is only a single href, but just in case... you can also extract URLs from all anchors inside an HTML document with sed and grep:
curl -s http://MyWebsite | grep -o '<a .*href=.*>' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*href=['"'"'"]//' -e 's/["'"'"'].*$//' -e '/^$/ d'
Upvotes: 0
Reputation: 543
If you want to keep the href=
part
curl -s http://MyWebsite | grep -E -io 'href="[^\"]+"'
If you only want URL without the href=
curl -s http://MyWebsite | grep -E -io 'href="[^\"]+"' | awk -F\" '{print$2}'
Upvotes: 1