user13602012
user13602012

Reputation: 45

in python - how to save multi HTML Source code to one single text file

I have list for Links (stored in links.txt file )

This code can save result of one link but I do not know how to make it download ALL the source codes of ALL links inside (links.txt) and SAVE THEM AS ONE SINGLE text file for next step of processing ...

import urllib.request    
urllib.request.urlretrieve("https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1", "result.txt")

Example links form links.txt

https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=2
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=3
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=4
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=5
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=6
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=7
....

Upvotes: 3

Views: 75

Answers (2)

Mensch
Mensch

Reputation: 700

urllib

import urllib.request

with open('links.txt', 'r') as f:
    links = f.readlines()

for link in links:
    with urllib.request.urlopen(link) as f:
        # get html text
        html = f.read().decode('utf-8')

        # append html to file
        with open('result.txt', 'w+') as f:
            f.write(html)

requests

you could also use requests library which i find much more readable

pip install requests
import requests

with open('links.txt', 'r') as f:
    links = f.readlines()

for link in links:
    response = requests.get(link)
    html = response.text

    # append html to file
    with open('result.txt', 'w+') as f:
        f.write(html)

Use loop for page navigation

Use for loop to generate page links as the only thing that is changing is the page no.

links = [
  f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'
  for n in range(1, 10) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
]

or as you go along

for n in range(1, 10):
  link = f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'

  [...]

Upvotes: 4

Andrej Zacharevicz
Andrej Zacharevicz

Reputation: 397

Actually, it's usual better to use requests lib, so you should start from installing it:

pip install requests

Then I'd propose to read the links.txt line by line, download all the data you need and store it in file output.txt:

import requests

data = []
# collect all the data from all links in the file 
with open('links.txt', 'r') as links:
    for link in links:
        response = requests.get(link)
        data.append(response.text)

# put all collected to a single file
with open('output.txt', 'w+') as output:
    for chunk in data:
        print(chunk, file=output)

Upvotes: 2

Related Questions