Regex to extract http links from an XML file

Question

I have an xml file with many lines like:

How do I extract just the link - http://store.vcenter.com/stores/en/product/tigers-midi/100?

I tried http://www\.\.com[^<]+ but that captures everything untill the end of the line - including quotes and closing XML tags.

I'm using this expression with egrep.

Gilles Qu&#233;not · Accepted Answer

Don't parse HTML with regex, use a proper XML/HTML parser.

Check: Using regular expressions with HTML tags You can use one of the following :

File:

Example with xmllint :

xmllint --xpath '//*[@vip="true"]/@href' file.xml 2>/dev/null

Output:

 href="http://store.vcenter.com/stores/en/product/tigers-midi/100"

If you need a quick & dirty one time command, you can do:

egrep -o 'https?://[^"]+' file

Answers (1)