piccachilly
piccachilly

Reputation: 11

Looping through csv of URLs using BeautifulSoup

I am leveraging BeautifulSoup to scrape websites in Python.

Where URLs have had rational structures for paginations, I have been successfully looping:

baseUrl = "https://www.example.com/inventory/page="
outputDataframe = list()
i = 1
for pageNumber in range(1, 10):

url = baseUrl + str(pageNumber)
print(url)

page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

However, I have a csv of URLs to scrape which have uniform classes and attributes within the page content; however, the URLs themselves are unique and do not follow a pattern.

How do I get BeautifulSoup to loop through a csv efficiently?

Many thanks.

So far, I have had success with uniform URLs using a loop. However, I do not know how to import/call a csv or unique URLs and then perform the same function.

Upvotes: 1

Views: 160

Answers (1)

Sebastian von Rotz
Sebastian von Rotz

Reputation: 43

For importing a csv I would work with pandas:

import pandas as pd 
df = pd.read_csv('URLs.csv', delimiter=',')

Then transform the dataframe column to list (I assume it only has one column):

urlList=list(df.iloc[:, 0])

After that simply iterate through the list:

for url in urlList:
   page = requests.get(url)
   soup = BeautifulSoup(page.content, "html.parser")

Upvotes: 1

Related Questions