Shiva Krishna Bavandla
Shiva Krishna Bavandla

Reputation: 26648

Writing to separate columns instead of comma seperated for csv files in scrapy

I am working with scrapy and writing the data fetched from web pages in to CSV files

My pipeline code:

def __init__(self):
    self.file_name = csv.writer(open('example.csv', 'wb'))
    self.file_name.writerow(['Title', 'Release Date','Director'])

def process_item(self, item, spider):
    self.file_name.writerow([item['Title'].encode('utf-8'),
                                item['Release Date'].encode('utf-8'),
                                item['Director'].encode('utf-8'),
                                ])
    return item 

And my output format in CSV file is:

Title,Release Date,Director
And Now For Something Completely Different,1971,Ian MacNaughton
Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,1979,Terry Jones
.....

But is it possible to write title and its values into one column , Release date and its values into the next column,Director and its values into the next column (because CSV is comma separated values) in a CSV file like the format below.

        Title,                                 Release Date,            Director
And Now For Something Completely Different,      1971,              Ian MacNaughton
Monty Python And The Holy Grail,                 1975,     Terry Gilliam and Terry Jones
Monty Python's Life Of Brian,                    1979,              Terry Jones

Any help would be appreciated. Thanks in advance.

Upvotes: 0

Views: 1620

Answers (2)

daedalus
daedalus

Reputation: 10923

Update -- Code re-factored in order to:

  1. use a generator function as suggested by @madjar and
  2. fit more closely to the code snippet provided by the OP.

The Target Output

I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.

                  Title,                      Release Date,             Director            
And Now For Something Completely Different,       1971,              Ian MacNaughton        
Monty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones 
Monty Python's Life Of Brian,                     1979,                Terry Jones    

The Code

Here is a sketch of the code you would need to produce the result above:

from texttable import Texttable

# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function

def process_item(item):
    # This massages each record in preparation for writing to csv
    item['Title'] = item['Title'].encode('utf-8') + ','
    item['Release Date'] = item['Release Date'].encode('utf-8') + ','
    item['Director'] = item['Director'].encode('utf-8')
    return item

def initialise_dataset():
    data = [{'Title' : 'Title',
         'Release Date' : 'Release Date',
         'Director' : 'Director'
         }, # first item holds the table header
            {'Title' : 'And Now For Something Completely Different',
         'Release Date' : '1971',
         'Director' : 'Ian MacNaughton'
         },
        {'Title' : 'Monty Python And The Holy Grail',
         'Release Date' : '1975',
         'Director' : 'Terry Gilliam and Terry Jones'
         },
        {'Title' : "Monty Python's Life Of Brian",
         'Release Date' : '1979',
         'Director' : 'Terry Jones'
         }
        ]

    data = [ process_item(item) for item in data ]
    return data

def records(data):
    for item in data:
        yield [item['Title'], item['Release Date'], item['Director'] ]

# this ends the data simulation part
# --------------------------------------------------------

def create_table(data):
    # Create the table
    table = Texttable(max_width=0)
    table.set_deco(Texttable.HEADER)
    table.set_cols_align(["l", "c", "c"])
    table.add_rows( records(data) )

    # split, remove the underlining below the header
    # and pull together again. Many ways of cleaning this...
    tt = table.draw().split('\n')
    del tt[1] # remove the line under the header
    tt = '\n'.join(tt)
    return tt

if __name__ == '__main__':
    data = initialise_dataset()
    table = create_table(data)
    print table

Upvotes: 1

madjar
madjar

Reputation: 12951

TSV (tab separated values) might get you what you want, but it often turns ugly when the lines have very different length.

You can easily write a bit of code to produce such a table. The trick is that you need to have all the rows before outputting in order to compute the width of the columns.

You can find lots of snippets for that on the internet, here is one I used before.

Upvotes: 1

Related Questions