DanielSon
DanielSon

Reputation: 1555

How to go through a list of urls to retrieve page data - Python

In a .py file, I have a variable that's storing a list of urls. How do I properly build a loop to retrieve the code from each url, so that I can extract specific data items from each page?

This is what I've tried so far:

import requests
import re
from bs4 import BeautifulSoup
import csv

#Read csv
csvfile = open("gymsfinal.csv")
csvfilelist = csvfile.read()
print csvfilelist

#Get data from each url
def get_page_data():
    for page_data in csvfilelist.splitlines():
        r = requests.get(page_data.strip())
        soup = BeautifulSoup(r.text, 'html.parser')
        return soup

pages = get_page_data()
print pages

Upvotes: 0

Views: 1249

Answers (1)

salmanwahed
salmanwahed

Reputation: 9657

By not using the csv module, you are reading the gymsfinal.csv file as text files. Read through the documentation on reading/writing csv files here: CSV File Reading and Writing.

Also you will get only the first page's soup content from your current code. Because get_page_data() function will return after creating the first soup. For your current code, You can yield from the function like,

def get_page_data():
    for page_data in csvfilelist.splitlines():
        r = requests.get(page_data.strip())
        soup = BeautifulSoup(r.text, 'html.parser')
        yield soup

pages = get_page_data()

# iterate over the generator
for page in pages:
    print pages

Also close the file you just opened.

Upvotes: 1

Related Questions