Kryton
Kryton

Reputation: 95

Scraper only outputting data from last URL to CSV

I'm very new to Python and trying to learn by doing small little projects. I'm currently trying to collect some information from various web pages, however, whenever it outputs the scraped data to CSV it only seems to output data from the last URL.

Ideally, I want it to be able to write to the CSV opposed to appending as I just want a CSV with only the latest data from the most recent scrape.

I've had a look through some other queries similar to this on StackOverflow but I'm either not understanding them or they're just not working for me. (Probably the former).

Any help would be greatly appreciated.

import csv
import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = ['URL1','URL2']

for URL in URL:
    response = requests.get(URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    nameElement = soup.find('p', attrs={'class':'name'}).a
    nameText = nameElement.text.strip()

    priceElement = soup.find('span', attrs={'class':'price'})
    priceText = priceElement.text.strip()



columns = [['Name','Price'], [nameText, priceText]]


with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerows(columns)

Upvotes: 0

Views: 32

Answers (1)

furas
furas

Reputation: 143197

You have to open file before for loop and write every row inside for loop

URL = ['URL1','URL2']

with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)

    writer.writerow( ['Name','Price'] )

    for URL in URL:
        response = requests.get(URL)
        soup = BeautifulSoup(response.content, 'html.parser')

        nameElement = soup.find('p', attrs={'class':'name'}).a
        nameText = nameElement.text.strip()

        priceElement = soup.find('span', attrs={'class':'price'})
        priceText = priceElement.text.strip()

        writer.writerow( [nameText, priceText] )

Or you have to create list before for loop and append() data to this list

URL = ['URL1','URL2']

columns = [ ['Name','Price'] ]

for URL in URL:
    response = requests.get(URL)
    soup = BeautifulSoup(response.content, 'html.parser')

    nameElement = soup.find('p', attrs={'class':'name'}).a
    nameText = nameElement.text.strip()

    priceElement = soup.find('span', attrs={'class':'price'})
    priceText = priceElement.text.strip()

    columns.append( [nameText, priceText] )

with open('index.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerows(columns)

Upvotes: 1

Related Questions