Reputation: 83
I want scrape the title from given url in multiple thread (example in 5 thread) and save them to one text file. how to do it and how to make sure I safely save the output to one file?
this is my code:
import csv
import requests
requests.packages.urllib3.disable_warnings()
urls = []
with open('Input.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
urls.append(row[1])
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
def get_title( url ):
try:
r = requests.get(url)
html_content = r.text.encode('UTF-8')
title = find_between(html_content , "<title>", "</title>")
return title
except:
return ""
for url in urls:
f = open('myfile.txt', 'a')
f.write(get_title(url) + '\n')
f.close()
Upvotes: 0
Views: 105
Reputation: 6111
try to use futures
1. create pool
2. sumbit function and parameters
3. get result from function
import csv
from concurrent import futures
pool = futures.ThreadPoolExecutor(5)
workers = [pool.sumbit(get_title,url) for url in urls]
while not all(worker.done() for worker in workers):
pass
with open(file) as f:
w = csv.writer(f)
w.writerows([[worker.result()] for worker in workers])
Upvotes: 1