Reputation: 45
I am trying to web scrape multiple websites for different types of products. I was able to web scrape one url. I created a list to web scrape multiple urls, then export the product name and price to a CVL file. However, it does not appear to be working as needed.
Below is my code:
#imports
import pandas as pd
import requests
from bs4 import BeautifulSoup
#Product Websites For Consolidation
urls = ['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']
for url in urls:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
#Locating All Products On Page
all_products_on_page = soup.find(class_='products wrapper container grid products-grid')
individual_items = all_products_on_page.find_all(class_='product-item-info')
#Breaking Down Product By Name And Price
aero_product_name = [item.find(class_='product-item-link').text for item in individual_items]
aero_product_price = [p.text if (p := item.find(class_='price')) is not None else 'no price' for item in individual_items]
Aero_Stripped_Lowers_Consolidated = pd.DataFrame(
{'Aero Product': aero_product_name,
'Prices': aero_product_price,
})
Aero_Stripped_Lowers_Consolidated.to_csv('MasterPriceTracker.csv')
The code exports the product name and price as desired to a CVL file, but only for the second URL, the "complete-lowers" one. I'm not sure what I'm messing up in the For loop to cause it not to web scrape both URLs. I verified the HTML code is the same for both URLs.
Any help would be greatly appreciated!
Upvotes: 1
Views: 661
Reputation: 2321
Move the to_csv
call outside the loop. Because it was inside the loop, it was rewriting the csv file for each entry (hence only the last entry showed up in the file).
Within the loop, append the dictionaries to the dataframe created before the loop starts. Also, there is no need for the headers
to be redefined each time in the loop, so I pulled them outside too.
import pandas as pd
import requests
from bs4 import BeautifulSoup
#Product Websites For Consolidation
urls = ['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']
Aero_Stripped_Lowers_Consolidated = pd.DataFrame(columns=['Aero Product', 'Prices'])
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}
for url in urls:
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
#Locating All Products On Page
all_products_on_page = soup.find(class_='products wrapper container grid products-grid')
individual_items = all_products_on_page.find_all(class_='product-item-info')
#Breaking Down Product By Name And Price
aero_product_name = [item.find(class_='product-item-link').text for item in individual_items]
aero_product_price = [p.text if (p := item.find(class_='price')) is not None else 'no price' for item in individual_items]
Aero_Stripped_Lowers_Consolidated = Aero_Stripped_Lowers_Consolidated.append(pd.DataFrame(
{'Aero Product': aero_product_name,
'Prices': aero_product_price,
}))
Aero_Stripped_Lowers_Consolidated.to_csv('MasterPriceTracker.csv')
Upvotes: 3