Reputation: 55
A beginner's question - I have a .txt file containing a list of .html files I want to download. The content of the file looks like this:
http://www.example.com/file1.html
http://www.example.com/file2.html
http://www.example.com/file3.html
I can get Python to download a single file using the code below, but I want it to read each URL from the .txt file and download each .html file.
import urllib.request
url = 'http://www.example.com/file1.html'
urllib.request.urlretrieve(url, '/users/user/Downloads/file1.html')
Is there a simple way of doing this?
Upvotes: 0
Views: 3854
Reputation: 375
First you have to read your .txt file as something you can iterate over. Then you can use a for
loop to go one-by-one over the url links:
import os
urls = open('pages.txt', 'r')
for i, url in enumerate(urls):
path = '/users/user/Downloads/{}'.format(os.path.basename(url)
urllib.request.urlretrieve(url, path)
Upvotes: 2
Reputation: 152
with open('file.txt') as f:
for line in f:
url = line
path = 'your path'+url.split('/', -1)[-1]
urllib.request.urlretrieve(url, path.rstrip('\n'))
Upvotes: 3
Reputation: 11
You can use a ThreadPool or ProcessingPool for concurrency, like this tutorial
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ",url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(file_name, 'wb') as f:
for data in r:
f.write(data)
return url
urls = ["https://jsonplaceholder.typicode.com/posts",
"https://jsonplaceholder.typicode.com/comments",
"https://jsonplaceholder.typicode.com/photos",
"https://jsonplaceholder.typicode.com/todos",
"https://jsonplaceholder.typicode.com/albums"
]
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
for r in results:
print(r)
Upvotes: 1