Reputation: 7
I am stuck trying to implement a python 3 program. What I am trying to do is web scrape printer page (http://192.168.1.10 for example which is hp laser jet for example) I am trying to make a scraper that will go to about 20 different urls with printer pages and grab the H1 tags where the printer model is stored in tag.
I am new to python, and I would like to use a txt file with urls and use a for loop to use the url as a varible.
my current code is like this and works for single url, but i dont know how to word what I am looking for to figure out how to use a text file and each line as a varible.
here is url text file for example:
http://192.168.1.10
http://192.168.1.11
http://192.168.1.12
...etc one url per line
My python 3 code looks like this:
import requests
from bs4 import BeautifulSoup
page = requests.get('http://192.168.1.10/')
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')
pagehp = page.find_all('h1')
for page in pagehp:
print(page.prettify())
use line here:
page = requests.get('http://192.168.1.10/')
How can I change that to my urls.txt and make it a loop so it uses each url on each line as that string?
Upvotes: 0
Views: 1067
Reputation: 2905
You can use the python open
module like this:
import requests
from bs4 import BeautifulSoup
url_file = "url_file.txt" #The URL should be written one per line in the url_file.txt file
with open(url_file, "r") as f:
url_pages = f.read()
# we need to split each urls into lists to make it iterable
pages = url_pages.split("\n") # Split by lines using \n
# now we run a for loop to visit the urls one by one
for single_page in pages:
page = requests.get(single_page.strip())
soup = BeautifulSoup(page.text, 'html.parser')
page = soup.find(class_='mastheadTitle')
pagehp = page.find_all('h1')
for page in pagehp:
print(page.prettify())
Upvotes: 2
Reputation: 3046
with open("urls.txt") as f:
for line in f:
page = requests.get(line.strip())
...
Upvotes: 0