Reputation: 45
I have list for Links (stored in links.txt file )
This code can save result of one link but I do not know how to make it download ALL the source codes of ALL links inside (links.txt) and SAVE THEM AS ONE SINGLE text file for next step of processing ...
import urllib.request
urllib.request.urlretrieve("https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1", "result.txt")
Example links form links.txt
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=2
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=3
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=4
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=5
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=6
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=7
....
Upvotes: 3
Views: 75
Reputation: 700
import urllib.request
with open('links.txt', 'r') as f:
links = f.readlines()
for link in links:
with urllib.request.urlopen(link) as f:
# get html text
html = f.read().decode('utf-8')
# append html to file
with open('result.txt', 'w+') as f:
f.write(html)
you could also use requests library which i find much more readable
pip install requests
import requests
with open('links.txt', 'r') as f:
links = f.readlines()
for link in links:
response = requests.get(link)
html = response.text
# append html to file
with open('result.txt', 'w+') as f:
f.write(html)
Use for loop to generate page links as the only thing that is changing is the page no.
links = [
f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'
for n in range(1, 10) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
]
or as you go along
for n in range(1, 10):
link = f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'
[...]
Upvotes: 4
Reputation: 397
Actually, it's usual better to use requests
lib, so you should start from installing it:
pip install requests
Then I'd propose to read the links.txt
line by line, download all the data you need and store it in file output.txt
:
import requests
data = []
# collect all the data from all links in the file
with open('links.txt', 'r') as links:
for link in links:
response = requests.get(link)
data.append(response.text)
# put all collected to a single file
with open('output.txt', 'w+') as output:
for chunk in data:
print(chunk, file=output)
Upvotes: 2