PythonLearner23
PythonLearner23

Reputation: 45

For loop does not work when web scraping multiple URLs. Only scrapes one URL

I am trying to web scrape multiple websites for different types of products. I was able to web scrape one url. I created a list to web scrape multiple urls, then export the product name and price to a CVL file. However, it does not appear to be working as needed.

Below is my code:

#imports
import pandas as pd
import requests
from bs4 import BeautifulSoup

#Product Websites For Consolidation
urls = ['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']
for url in urls:
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')


    #Locating All Products On Page
    all_products_on_page = soup.find(class_='products wrapper container grid products-grid')
    individual_items = all_products_on_page.find_all(class_='product-item-info')


    #Breaking Down Product By Name And Price
    aero_product_name = [item.find(class_='product-item-link').text for item in individual_items]
    aero_product_price = [p.text if (p := item.find(class_='price')) is not None else 'no price' for item in individual_items]


    Aero_Stripped_Lowers_Consolidated = pd.DataFrame(
        {'Aero Product': aero_product_name,
        'Prices': aero_product_price,
        })

    Aero_Stripped_Lowers_Consolidated.to_csv('MasterPriceTracker.csv')

The code exports the product name and price as desired to a CVL file, but only for the second URL, the "complete-lowers" one. I'm not sure what I'm messing up in the For loop to cause it not to web scrape both URLs. I verified the HTML code is the same for both URLs.

Any help would be greatly appreciated!

Upvotes: 1

Views: 661

Answers (1)

Sam Chats
Sam Chats

Reputation: 2321

Move the to_csv call outside the loop. Because it was inside the loop, it was rewriting the csv file for each entry (hence only the last entry showed up in the file).

Within the loop, append the dictionaries to the dataframe created before the loop starts. Also, there is no need for the headers to be redefined each time in the loop, so I pulled them outside too.

import pandas as pd
import requests
from bs4 import BeautifulSoup

#Product Websites For Consolidation
urls = ['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']

Aero_Stripped_Lowers_Consolidated = pd.DataFrame(columns=['Aero Product', 'Prices'])
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}

for url in urls:
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')


    #Locating All Products On Page
    all_products_on_page = soup.find(class_='products wrapper container grid products-grid')
    individual_items = all_products_on_page.find_all(class_='product-item-info')


    #Breaking Down Product By Name And Price
    aero_product_name = [item.find(class_='product-item-link').text for item in individual_items]
    aero_product_price = [p.text if (p := item.find(class_='price')) is not None else 'no price' for item in individual_items]


    Aero_Stripped_Lowers_Consolidated = Aero_Stripped_Lowers_Consolidated.append(pd.DataFrame(
        {'Aero Product': aero_product_name,
        'Prices': aero_product_price,
        }))

Aero_Stripped_Lowers_Consolidated.to_csv('MasterPriceTracker.csv')

Upvotes: 3

Related Questions