Mary
Mary

Reputation: 1

Wget extract links and save them to a file

I need to download all page links from http://en.wikipedia.org/wiki and save them to a file all with one command (with Wget for Windows).

The grep command is not recognized under Windows.

wget http://en.wikipedia.org/wiki -q -O - |grep -Po '(?<=href=")[^"]*'

The output of the links in the file does not need to be in any specific format.

What do you recommend ?

Thank

Upvotes: 2

Views: 2519

Answers (1)

zb226
zb226

Reputation: 10500

Multiple problems here:

  1. Tool availability: wget and grep are not available on Windows by default. There exist numerous ports though, have a look here and here.
  2. HTTPS verification: Wikipedia forwards from http:// to https://, so you'll very likely have to add the option --no-check-certificate to the call (or provide a proper certificate store via --ca-certificate).
  3. Escaping in Windows: to delimit parameters, don't use single quotes ', but double quotes ". You have to escape any double quotes inside the parameters like this \".
  4. Escaping in Windows: the caret character ^ has to be escaped like this: ^^.

All in all this gives you:

wget --no-check-certificate "http://en.wikipedia.org/wiki" -q -O - | grep -Po "(?<=href=\")[^^\"]*"

Upvotes: 3

Related Questions