viperfx
viperfx

Reputation: 327

Parsing CSV files, and writing them into another CSV format

The format of my import CSV fetched using urllib2 and put into folders are like so:

number,season,episode,production code,airdate,title,special?,tvrage
1,1,1,"101",24/Sep/07,"Pilot",n,"http://www.tvrage.com/Chuck/episodes/579282"

Now I am successfully converting that into SQL statments as well as another CSV file that can be inserted into my database. Into a format like so:

,1,1,1,"Pilot",'2006-10-11',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1

Using the following code

csv = """,%s,%s,%s,%s,%r,,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1""" % (showid, line[1],line[2], line[5], date(line[4]))
    print>>final, csv

EDIT -

I have changed from string formatting to this:

csv = ','+showid+','+line[1]+','+line[2]+','+line[5]+','+date(line[4])+',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'

Its not much better, and I am still having trouble with some files being skipped on the parse. Not sure if its me or the CSV module.

Problem is its going through some files perfectly fine. Some CSV files it just skips, and for some I just get errors like IndexError: list index out of range

If anyone has experience with CSV files and getting them to parse correctly I would really appreciate the help.

Here is the Full Source Code: http://cl.ly/2W472g303D1p0J3S2o46

dsimport.py - http://pastie.org/3076663 CSVFileHandler.py - http://pastie.org/3076667

Thanks

Upvotes: 1

Views: 317

Answers (2)

viperfx
viperfx

Reputation: 327

Nevermind all fixed. In the end I just used the excel dialect, and did the output csv with pipe lines. Either way it was quite fiddly and honestly feel like i got it to work with sheer luck.

Thanks for all the help.

Upvotes: 0

ubershmekel
ubershmekel

Reputation: 12798

I'm not sure exactly what are all the errors, but here are a few tips:

  1. processFile(line), line is a bit of a bad name as it isn't a string line, it's a row or list of elements. That's what confused Tim and me as well at first sight.
  2. You should verify that line has at least 6 elements as your script requires.
  3. You can use the join method which is awesome.

Here's a small refactoring:

def processFile(row):
    if len(row) < 6:
        #raise Exception('too few columns')
        # maybe it's better to just ignore bad rows in your case
        return
    items = [
        '',
        showid,
        row[1],
        row[2],
        row[5],
        date(row[4]),
        ]
    res = ','.join(items)
    res += ',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'
    print res
    print>>final, res

handler = CSVFileHandler('/Users/tharshan/WebRoot/stv/export/csv/%s-save.csv' % name)
try:
    handler.process(processFile, name)    
except Exception, e:
    print 'Failed processing and skipping %s because of: %s' % (name, e)

final.close()

Upvotes: 1

Related Questions