Reputation: 1039
I´m web scraping different webpages and for each webpage I´m writing each row of the csv file
import csv
fieldnames=["Title", "Author", "year"]
counter=1
for webpage of webpages:
if counter==1:
f = open('file.csv', 'wb')
my_writer = csv.DictWriter(f, fieldnames)
my_writer.writeheader()
f.close()
something where I get the information (title, author and year) for each webpage
variables={ele:"NA" for ele in fieldnames}
variables['Title']=title
variables['Author']=author
variables['year']=year
with open('file.csv', 'a+b') as f:
header = next(csv.reader(f))
dict_writer = csv.DictWriter(f, header)
dict_writer.writerow(variables)
counter+=1
However, there could be more than one author (so author after web scraping is actually a list) so I would like to have in the headers of the csv file: author1, author2, author3, etc. But I don't know what would be the maximum number of authors. So in the loop I would like to edit the header and start adding author2,author3 etc depending if in that row is necessary to create more authors.
Upvotes: 0
Views: 113
Reputation: 736
It could be something like:
def write_to_csv(file_name, records, fieldnames=None):
import csv
from datetime import datetime
with open('/tmp/' + file_name, 'w') as csvfile:
if not fieldnames:
fieldnames = records[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, extrasaction='ignore')
writer.writeheader()
for row in records:
writer.writerow(row)
def scrape():
for webpage of webpages:
webpage_data = [{'title':'','author1':'foo','author2':'bar'}] #sample data
write_to_csv(webpage[0].title+'csv', webpage_data,webpage_data[0].keys())
I`m assuming:
So in the loop, we`ll just get the data, and pass the relevant fieldnames and the values to another function, so be able to write it to csv.
Upvotes: 1
Reputation: 3570
Because "Author" is a variable-length list, you should serialize it in some way to fit inside a single field. For example, use a semicolon as a separator.
Assuming you have an authors
field with all the authors in them from your webpage
object, you would want to change your assignment line to something like this:
variables['Authors']=';'.join(webpage.authors)
This is a simple serialization of all of the authors. You can of course come up with something else - use a different separator or serialize to JSON or YAML or something more elaborate like that.
Hopefully that gives some ideas.
Upvotes: 1