user2246905
user2246905

Reputation: 1039

Create new headers when writing a csv using python

I´m web scraping different webpages and for each webpage I´m writing each row of the csv file

import csv
fieldnames=["Title", "Author", "year"]
counter=1
for webpage of webpages:
    if counter==1:
        f = open('file.csv', 'wb')  
        my_writer = csv.DictWriter(f, fieldnames)
        my_writer.writeheader()
        f.close()

    something where I get the information (title, author and year) for each webpage

    variables={ele:"NA" for ele in fieldnames}
    variables['Title']=title        
    variables['Author']=author
    variables['year']=year


    with open('file.csv', 'a+b') as f:
    header = next(csv.reader(f))
    dict_writer = csv.DictWriter(f, header)
    dict_writer.writerow(variables) 
    counter+=1

However, there could be more than one author (so author after web scraping is actually a list) so I would like to have in the headers of the csv file: author1, author2, author3, etc. But I don't know what would be the maximum number of authors. So in the loop I would like to edit the header and start adding author2,author3 etc depending if in that row is necessary to create more authors.

Upvotes: 0

Views: 113

Answers (2)

Sirius
Sirius

Reputation: 736

It could be something like:

def write_to_csv(file_name, records, fieldnames=None):

    import csv
    from datetime import datetime

    with open('/tmp/' + file_name, 'w') as csvfile:
        if not fieldnames:
            fieldnames = records[0].keys()
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames,   extrasaction='ignore')
        writer.writeheader()
        for row in records:
            writer.writerow(row)

def scrape():
    for webpage of webpages:
        webpage_data = [{'title':'','author1':'foo','author2':'bar'}] #sample data
        write_to_csv(webpage[0].title+'csv', webpage_data,webpage_data[0].keys())

I`m assuming:

  • Data will be consistent for the same webpage, but differ the next webpage in loop
  • webpage data is a list of dictionaries, having values mapped to keys
  • the above code is based on Python 3

So in the loop, we`ll just get the data, and pass the relevant fieldnames and the values to another function, so be able to write it to csv.

Upvotes: 1

Dave
Dave

Reputation: 3570

Because "Author" is a variable-length list, you should serialize it in some way to fit inside a single field. For example, use a semicolon as a separator.

Assuming you have an authors field with all the authors in them from your webpage object, you would want to change your assignment line to something like this:

variables['Authors']=';'.join(webpage.authors)

This is a simple serialization of all of the authors. You can of course come up with something else - use a different separator or serialize to JSON or YAML or something more elaborate like that.

Hopefully that gives some ideas.

Upvotes: 1

Related Questions