Hakim
Hakim

Reputation: 11700

How to use regular expressions in wget for rejecting files?

I am trying to download the contents of a website using wget tool. I used -R option to reject some file types. but there are some other files which I don't want to download. These files are named as follows, and don't have any extensions.

string-ID

for example:

newsbrief-02

How I can tell wget not to download these files (the files which their names start with specified string)?

Upvotes: 29

Views: 32336

Answers (2)

Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7694

Since (apparently) v1.14 wget accepts regular expressions : --reject-regex and --accept-regex (with --regex-type posix by default, can be set to pcre if compiled with libpcre support).

Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :

wget --reject-regex 'expr1|expr2|…' http://example.com

Upvotes: 51

Igor Chubin
Igor Chubin

Reputation: 64563

You can not specify a regular expression in the wget -R key, but you can specify a template (like file template in a shell).

The answer looks like:

$ wget -R 'newsbrief-*' ...

You can also use ? and symbol classes [].

For more information see info wget.

Upvotes: 11

Related Questions