Reputation: 11
I am trying to download PDFs from my school server, but the way it is set up by the stupid IT department is that we have to click each link one by one and there are hundreds of PDFs on the same page with links.
How can I download using python or wget "2015-0001.pdf" "2015-0002.pdf" "2015-0003.pdf"
I have tried wget --accept pdf,zip,7z,doc --recursive
but it only grabs the index.html file of the site and no actual files.
Upvotes: 0
Views: 479
Reputation: 83676
Use Scrapy: http://scrapy.org/
An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Scrapy tutorial how to get started with website scraping
Upvotes: 1