Reputation: 1555
In a .py file, I have a variable that's storing a list of urls. How do I properly build a loop to retrieve the code from each url, so that I can extract specific data items from each page?
This is what I've tried so far:
import requests
import re
from bs4 import BeautifulSoup
import csv
#Read csv
csvfile = open("gymsfinal.csv")
csvfilelist = csvfile.read()
print csvfilelist
#Get data from each url
def get_page_data():
for page_data in csvfilelist.splitlines():
r = requests.get(page_data.strip())
soup = BeautifulSoup(r.text, 'html.parser')
return soup
pages = get_page_data()
print pages
Upvotes: 0
Views: 1249
Reputation: 9657
By not using the csv
module, you are reading the gymsfinal.csv
file as text files. Read through the documentation on reading/writing csv files here: CSV File Reading and Writing.
Also you will get only the first page's soup
content from your current code. Because get_page_data()
function will return after creating the first soup. For your current code, You can yield
from the function like,
def get_page_data():
for page_data in csvfilelist.splitlines():
r = requests.get(page_data.strip())
soup = BeautifulSoup(r.text, 'html.parser')
yield soup
pages = get_page_data()
# iterate over the generator
for page in pages:
print pages
Also close the file you just opened.
Upvotes: 1