Reputation: 11
I am leveraging BeautifulSoup
to scrape websites in Python.
Where URLs have had rational structures for paginations, I have been successfully looping:
baseUrl = "https://www.example.com/inventory/page="
outputDataframe = list()
i = 1
for pageNumber in range(1, 10):
url = baseUrl + str(pageNumber)
print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
However, I have a csv of URLs to scrape which have uniform classes and attributes within the page content; however, the URLs themselves are unique and do not follow a pattern.
How do I get BeautifulSoup to loop through a csv efficiently?
Many thanks.
So far, I have had success with uniform URLs using a loop. However, I do not know how to import/call a csv or unique URLs and then perform the same function.
Upvotes: 1
Views: 160
Reputation: 43
For importing a csv I would work with pandas:
import pandas as pd
df = pd.read_csv('URLs.csv', delimiter=',')
Then transform the dataframe column to list (I assume it only has one column):
urlList=list(df.iloc[:, 0])
After that simply iterate through the list:
for url in urlList:
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
Upvotes: 1