Reputation: 26648
I am working with scrapy and writing the data fetched from web pages in to CSV files
My pipeline
code:
def __init__(self):
self.file_name = csv.writer(open('example.csv', 'wb'))
self.file_name.writerow(['Title', 'Release Date','Director'])
def process_item(self, item, spider):
self.file_name.writerow([item['Title'].encode('utf-8'),
item['Release Date'].encode('utf-8'),
item['Director'].encode('utf-8'),
])
return item
And my output format in CSV file is:
Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....
But is it possible to write title
and its values into one column , Release date
and its values into the next column,Director
and its values into the next column (because CSV is comma separated values) in a CSV file like the format below.
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
Any help would be appreciated. Thanks in advance.
Upvotes: 0
Views: 1620
Reputation: 10923
Update -- Code re-factored in order to:
- use a generator function as suggested by @madjar and
- fit more closely to the code snippet provided by the OP.
I am trying an alternative using texttable
. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer
and still get the padded spaces in each field.
Title, Release Date, Director
And Now For Something Completely Different, 1971, Ian MacNaughton
Monty Python And The Holy Grail, 1975, Terry Gilliam and Terry Jones
Monty Python's Life Of Brian, 1979, Terry Jones
Here is a sketch of the code you would need to produce the result above:
from texttable import Texttable
# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function
def process_item(item):
# This massages each record in preparation for writing to csv
item['Title'] = item['Title'].encode('utf-8') + ','
item['Release Date'] = item['Release Date'].encode('utf-8') + ','
item['Director'] = item['Director'].encode('utf-8')
return item
def initialise_dataset():
data = [{'Title' : 'Title',
'Release Date' : 'Release Date',
'Director' : 'Director'
}, # first item holds the table header
{'Title' : 'And Now For Something Completely Different',
'Release Date' : '1971',
'Director' : 'Ian MacNaughton'
},
{'Title' : 'Monty Python And The Holy Grail',
'Release Date' : '1975',
'Director' : 'Terry Gilliam and Terry Jones'
},
{'Title' : "Monty Python's Life Of Brian",
'Release Date' : '1979',
'Director' : 'Terry Jones'
}
]
data = [ process_item(item) for item in data ]
return data
def records(data):
for item in data:
yield [item['Title'], item['Release Date'], item['Director'] ]
# this ends the data simulation part
# --------------------------------------------------------
def create_table(data):
# Create the table
table = Texttable(max_width=0)
table.set_deco(Texttable.HEADER)
table.set_cols_align(["l", "c", "c"])
table.add_rows( records(data) )
# split, remove the underlining below the header
# and pull together again. Many ways of cleaning this...
tt = table.draw().split('\n')
del tt[1] # remove the line under the header
tt = '\n'.join(tt)
return tt
if __name__ == '__main__':
data = initialise_dataset()
table = create_table(data)
print table
Upvotes: 1
Reputation: 12951
TSV (tab separated values) might get you what you want, but it often turns ugly when the lines have very different length.
You can easily write a bit of code to produce such a table. The trick is that you need to have all the rows before outputting in order to compute the width of the columns.
You can find lots of snippets for that on the internet, here is one I used before.
Upvotes: 1