Reputation: 81
Im trying to remove lots of strange file name, like index.html?replytocom=653
, index.html?replytocom=667
, etc.
Im using below code:
wget -k -m -r -q -R gif,png,jpg,jpeg,GIF,PNG,JPG,JPEG,?,= -t 1 http://www.website.com/
and tried also
wget -k -m -r -q -R gif,png,jpg,jpeg,GIF,PNG,JPG,JPEG,?,=,replytocom -t 1 http://www.website.com/
but no luck..
Upvotes: 2
Views: 1526
Reputation: 166429
In this case, it's not possible to use rejlist
, because the documentation for wget says:
Note, too, that query strings (strings at the end of a URL beginning with a question mark (`?`) are not included as part of the filename for accept/reject rules, even though these will actually contribute to the name chosen for the local file. It is expected that a future version of Wget will provide an option to allow matching against query strings.
Therefore you need to use --reject-regex
parameter instead.
wget --reject-regex '(.*)\?(.*)' http://example.com
Beware that it seems you can use --reject-regex
only once per wget
call. That is, you have to use |
in a single regex if you want to select on several regex :
wget --reject-regex 'expr1|expr2|…' http://example.com
So answering you question, I'm guessing the solution would be something like:
wget --reject-regex '(.*)replytocom(.*)' (...)
Upvotes: 2