david
david

Reputation: 7

web scraping multiple urls for H1 tags using a text file

I am stuck trying to implement a python 3 program. What I am trying to do is web scrape printer page (http://192.168.1.10 for example which is hp laser jet for example) I am trying to make a scraper that will go to about 20 different urls with printer pages and grab the H1 tags where the printer model is stored in tag.

I am new to python, and I would like to use a txt file with urls and use a for loop to use the url as a varible.

my current code is like this and works for single url, but i dont know how to word what I am looking for to figure out how to use a text file and each line as a varible.

here is url text file for example:

http://192.168.1.10
http://192.168.1.11
http://192.168.1.12
...etc one url per line

My python 3 code looks like this:

import requests
from bs4 import BeautifulSoup

page = requests.get('http://192.168.1.10/')
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')

pagehp = page.find_all('h1')

for page in pagehp:
    print(page.prettify())

use line here:

page = requests.get('http://192.168.1.10/')

How can I change that to my urls.txt and make it a loop so it uses each url on each line as that string?

Upvotes: 0

Views: 1067

Answers (2)

Erisan Olasheni
Erisan Olasheni

Reputation: 2905

You can use the python open module like this:

import requests
from bs4 import BeautifulSoup

url_file = "url_file.txt" #The URL should be written one per line in the url_file.txt file

Now let's read urls from the url_file.txt

with open(url_file, "r") as f:
  url_pages = f.read()
# we need to split each urls into lists to make it iterable
pages = url_pages.split("\n") # Split by lines using \n

# now we run a for loop to visit the urls one by one
for single_page in pages:
  page = requests.get(single_page.strip())
  soup = BeautifulSoup(page.text, 'html.parser')
  page = soup.find(class_='mastheadTitle')

  pagehp = page.find_all('h1')

  for page in pagehp:
      print(page.prettify())

Upvotes: 2

igon
igon

Reputation: 3046

with open("urls.txt") as f:
    for line in f:
        page = requests.get(line.strip())
        ...

Upvotes: 0

Related Questions