Reputation: 397
I try to get specific files with wget and a regular expression.
The files are infographics.jpg and informatics.jpg
Here the command line:
wget -r -nd -P test -A jpg --accept-regex '.*\/i.*.jpg'
It downloads every jpg it can find instead of just the two files beginning with an "i".
If I add an "n" after the "i"...
wget -r -nd -P test -A jpg --accept-regex '.*\/in.*.jpg'
... that works perfectly, downloading the two files beginning with "in".
But not otherwise. I found that "/i" means case-insensitive. So I tried different ways to make sure the "i" is taken as a letter, not a switch, like "[i]", ... No luck.
Is this a /i problem? And may I get rid of that?
Upvotes: 2
Views: 682
Reputation: 48751
More probably it's for greediness of regex. Yours matches from beginning up to an i
(that might exist on a path, not within filename) then up to a .jpg
sequence of characters - could not be at the end. So you need to restrict regex a bit:
/i[^/]*\.jpg$
This matches a /
immediately followed by an i
then without jumping over different parts of URL ([^/]*
, filename only) ends to .jpg
that meets end of URL as well.
Upvotes: 2