Plouf
Plouf

Reputation: 397

Selectively downloading files with wget and regex fails on /i

I try to get specific files with wget and a regular expression.

The files are infographics.jpg and informatics.jpg

Here the command line:

wget -r -nd -P test -A jpg --accept-regex '.*\/i.*.jpg'

It downloads every jpg it can find instead of just the two files beginning with an "i".

If I add an "n" after the "i"...

wget -r -nd -P test -A jpg --accept-regex '.*\/in.*.jpg'

... that works perfectly, downloading the two files beginning with "in".

But not otherwise. I found that "/i" means case-insensitive. So I tried different ways to make sure the "i" is taken as a letter, not a switch, like "[i]", ... No luck.

Is this a /i problem? And may I get rid of that?

Upvotes: 2

Views: 682

Answers (1)

revo
revo

Reputation: 48751

More probably it's for greediness of regex. Yours matches from beginning up to an i (that might exist on a path, not within filename) then up to a .jpg sequence of characters - could not be at the end. So you need to restrict regex a bit:

/i[^/]*\.jpg$

This matches a / immediately followed by an i then without jumping over different parts of URL ([^/]*, filename only) ends to .jpg that meets end of URL as well.

Upvotes: 2

Related Questions