kimy82
kimy82

Reputation: 4495

Wget download pdf

I am trying to download a pdf file using wget.

When I do:

wget <url> it downloads a corrupted file however if I run wget -i test.txt with the pdf URL inside this test txt file it works and the file is not corrupted.

Does anyone know why?

From the logs I can see the following.

In the first case, it is downloading a note found page.

Length: 11322 (11K) [text/html] Saving to: ‘media.nl?id=39194.1’

In the second it is a proper pdf.

Length: 58272 (57K) [application/pdf] Saving to: ‘media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf’

Thanks,

Upvotes: 0

Views: 2384

Answers (2)

rootkonda
rootkonda

Reputation: 1743

I got the same issue but I changed the command to this and then it worked fine when i tested it:

Wget —-no-check-certificate https://www.roofingsuppliesuk.co.uk/core/media/'media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'

i just added single quotes beginning at 'media.nl.......pdf'

Make sure the file with same name doesnt exist. You dont need to add --no-check-certificate if you dont get self-signed certificate error

Upvotes: 1

Freddy
Freddy

Reputation: 4718

Put your URL into quotes. Not quoting the URL can lead to strange effects, in your case the & is interpreted by the shell.

E.g.

wget "https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf"

or

wget 'https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194&c=4667446&h=34c63dbaaa7adc7c8a33&_xt=.pdf'

or with escaping of &

wget https://www.roofingsuppliesuk.co.uk/core/media/media.nl?id=39194\&c=4667446\&h=34c63dbaaa7adc7c8a33\&_xt=.pdf

Upvotes: 2

Related Questions