InSpace
InSpace

Reputation: 3

Wget fails to download a pdf from a direct link

I am trying to use wget to download a pdf file. I have a direct link to the pdf document and input the following into command line:

wget -A pdf -nc -np -nd --content-disposition --wait=1 --tries=5 "https://prospektbestellung.nordseetourismus.de/mediafiles/Sonstiges/Ortsprospekte/amrum2021.pdf"

This uses a lot of unnecessary options, but they should not mess with the outcome, which is:

HTTP request sent, awaiting response... Read error (Unknown error) in headers.

Is there any way to fix this directly using wget or are there any other solutions, preferably in Python, which I could consider?

Upvotes: 0

Views: 811

Answers (4)

Daweo
Daweo

Reputation: 36360

any other solutions, preferably in Python, which I could consider?

You might use urllib.request.urlretrieve from built-in module urllib.request as follows

import urllib.request
urllib.request.urlretrieve("https://prospektbestellung.nordseetourismus.de/mediafiles/Sonstiges/Ortsprospekte/amrum2021.pdf","amrum2021.pdf")

this code does download file and save it under name amrum2021.pdf in current working directory. Unlike requests urllib.request is built-in module so no additional installation beyond python itself is required.

Upvotes: 0

balderman
balderman

Reputation: 23815

A python based solution below

import requests

url = 'https://prospektbestellung.nordseetourismus.de/mediafiles/Sonstiges/Ortsprospekte/amrum2021.pdf'
r = requests.get(url)
with open('my_file.pdf', 'wb') as f:
    f.write(r.content)

Upvotes: 1

Artem Tyrnov-Tuchin
Artem Tyrnov-Tuchin

Reputation: 51

Your oneliner works for me. I've successfully download pdf.

wget -A pdf -nc -np -nd --content-disposition --wait=1 --tries=5 "https://prospektbestellung.nordseetourismus.de/mediafiles/Sonstiges/Ortsprospekte/amrum2021.pdf"

I believe there is network or firewall issue.

Upvotes: 1

Sunny
Sunny

Reputation: 56

When using WGET its sending it's own headers and the only one that will be different from the browser is the user-agent.

You can pick the user-agent from your browser or just get a random one online and set it as a header during the request.

Upvotes: 1

Related Questions