user11749176
user11749176

Reputation:

How would I go about scraping data from a website and updating a file with the new info each day while saving older data?

I was initially planning on using a CSV file, however it would require me to manually log into VScode each day and run my script to add the data to a csv file, and it would replace the old data that I had previously input.

Upvotes: 0

Views: 94

Answers (1)

bherbruck
bherbruck

Reputation: 2226

If your scraped dataset is small, scrape the data to a nested list of dictionaries with the structure [{<column1>: <data>, <column2>: <data>, ...}, ...] for each row you want to save, then use this function to append that dictionary to a csv file by doing append_csv_dict(<path_to_your_csv>, <your_dictionary>):

import csv

def append_csv_dict(path, data):
    '''
    Append a csv with a dictionary keys as column headers
    Args:
        path (str): Path to the csv file
        data (dict or list): Dictionary or list(dict) with keys as 
                             column  headers and values as column data
    '''
    with open(path, 'a') as file:
        # set the field names to the keys of the dictionary or keys of the first item
        fieldnames = list(data.keys()) if isinstance(data, dict) else data[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        # write the header if the file is new
        if file.tell() == 0:
            writer.writeheader()
        if isinstance(data, dict):
            fieldnames = list(data.keys())
            # write the row
            writer.writerow(data)
        elif isinstance(data, list):
            # write the rows if it is a list
            writer.writerows(data)

# some example data, you can do one dictionary at a time if you only do one row per day
scraped_data = [
    {
        'first_name': 'John',
        'last_name': 'Do',
        'age': 31
    },
    {
        'first_name': 'Jane',
        'last_name': 'Do',
        'age': 33
    },
    {
        'first_name': 'Foo',
        'last_name': 'Bar',
        'age': 58
    }
]

append_csv_dict('./scrape.csv', scraped_data)

Output (scrape.csv):

first_name,last_name,age
John,Do,31
Jane,Do,33
Foo,Bar,58

Upvotes: 1

Related Questions