Reputation: 91
In Linux shell, I am trying to return links to JPG files from the downloaded HTML script file. So far I only got to this point:
grep 'http://[:print:]*.jpg' 'www_page.html'
I don't want to use auxiliary commands like 'tr', 'cut', 'sed' etc...'lynx' is okay!
Upvotes: 0
Views: 3186
Reputation: 16185
Using grep alone without massaging the file is doable but not recommended as many have pointed out in the comments.
If you can loosen up your requirements a bit then you can use html tidy to massage the downloaded HTML file so that each html entities are on a single line so that the regular expression can be simpler like you wanted, something like this:
$ tidy file.html|grep -o 'http://[[:print:]]*.jpg'
Note the use of "-o" option to grep to print only the matching part of the input
Upvotes: 2