jazzjackrabbit
jazzjackrabbit

Reputation: 33

CSV append overwrites everything but column headers

I'm trying to fetch online reviews (multiple pages), extract parts of each review (title, user, text,...) and write that information to a csv file. Yeah those questions have been asked multiple times, but i couldn't find the one that solves my issue below:

First I create & prepare the column headers of the csv file at the beginning:

with open('review-raw-data.csv', 'wb') as output:
    fieldnames = ['title', 'text', 'starRating', 'helpfulScore', 'date', 'user', 'id', 'url']
    writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown',                       extrasaction='ignore')

Which works fine. Later I'm trying to append the extracted information to that csv file:

def extract(data):
    with open('review-raw-data.csv', 'ab') as output:
        fieldnames = ['title', 'text', 'starRating', 'helpfulScore', 'date', 'user', 'id', 'url']
        writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, lineterminator='\n', quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')

        for review in data:
            # extraction happening...
            reviewobj = Review(title, text, helpfulscore, rating, date, user, reviewid, url)

            writer.writerow({'title': reviewobj.title, 'text': reviewobj.text, 'starRating': reviewobj.rating,
                         'helpfulScore': reviewobj.helpfulscore, 'date': reviewobj.date, 'user': reviewobj.user,
                         'id': reviewobj.reviewid, 'url': reviewobj.url})

This function gets called after each review-page has been received. So this probably isn't the smartest/easiest way to do it, but it works. The problem is, that the appending part isn't working as expected when calling this code for the 2nd, 3rd, ... time, because all the rows appended in previous iterations get overwritten. The column headers remain in place.

Example of what i wanted: (columns separated by ',')

title, user, id
title1, user1, id1
title2, user2, id2
title3, user3, id3

Example of what i get after 2nd iteration:

title, user, id
title2, user2, id2  # row 1 is missing...

Example of what i get after 3rd iteration:

title, user, id
title3, user3, id3  # rows 1 & 2 are missing...

What am I doing wrong?

Upvotes: 0

Views: 1247

Answers (1)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

It's impossible to tell exactly what goes wrong without the whole code - and without knowing how you call that code -, but you're obviously calling the "create & prepare the column headers" part of the code more than once, because the following works as expected:

bruno@bigb:~/Work/playground$ cat appcsv.py
import csv

with open('review-raw-data.csv', 'wb') as output:
    fieldnames = ['a', 'b', 'c']
    writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')
    writer.writeheader()


def extract(data):
    with open('review-raw-data.csv', 'ab') as output:
        fieldnames = ['a', 'b', 'c']
        writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')
        for row in data:
            writer.writerow(dict(zip(fieldnames, row)))


dataset = [
    [(1, 2, 3), (4, 5, 6)],
    [(5, 6, 7),]
    ]

for data in dataset:
    extract(data)


bruno@bigb:~/Work/playground$ python appcsv.py
bruno@bigb:~/Work/playground$ cat review-raw-data.csv 
"a","b","c"
"1","2","3"
"4","5","6"
"5","6","7"

Now it's easy to avoid overwriting an existing file : just check if it exists before opening it:

import os

filename = 'review-raw-data.csv'
flag = "ab" if os.path.exists(filename) else "wb"
with open(filename, flag) as output:
   # etc

As a side note: you have quite a lot of duplicated code (fieldnames definition, opening the file and creating a DictWriter). You should either factor this out in a function, and/or only do this stuff once and pass the writer to extract.

def get_writer(outfile):
    fieldnames = [# etc ]
    writer = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')

def extract(data, writer):
    for review in data:
        # extraction happening...
        reviewobj = Review(title, text, helpfulscore, rating, date, user, reviewid, url)
        writer.writerow({
           'title': reviewobj.title, 'text': reviewobj.text, 
           'starRating': reviewobj.rating, 
           'helpfulScore': reviewobj.helpfulscore, 
           'date': reviewobj.date, 'user': reviewobj.user,
           'id': reviewobj.reviewid, 'url': reviewobj.url
            })

def main():
    filename = 'review-raw-data.csv'
    exists = os.path.exists(filename)
    flag = "ab" if exists else "wb"
    with open(filename) as outfile:
        writer = get_writer(outfile)
        if not exists:
            writer.writeheaders()
        for data in whereever_you_get_your_data_from():
             extract(data, writer)

Upvotes: 1

Related Questions