selegnasol
selegnasol

Reputation: 91

Trying to 'grep' links from downloaded html pages in bash shell environment without cut, sed, tr commands (only e/grep)

In Linux shell, I am trying to return links to JPG files from the downloaded HTML script file. So far I only got to this point:

grep 'http://[:print:]*.jpg' 'www_page.html'

I don't want to use auxiliary commands like 'tr', 'cut', 'sed' etc...'lynx' is okay!

Upvotes: 0

Views: 3186

Answers (1)

holygeek
holygeek

Reputation: 16185

Using grep alone without massaging the file is doable but not recommended as many have pointed out in the comments.

If you can loosen up your requirements a bit then you can use html tidy to massage the downloaded HTML file so that each html entities are on a single line so that the regular expression can be simpler like you wanted, something like this:

$ tidy file.html|grep -o 'http://[[:print:]]*.jpg'

Note the use of "-o" option to grep to print only the matching part of the input

Upvotes: 2

Related Questions