Reputation: 327
The format of my import CSV fetched using urllib2 and put into folders are like so:
number,season,episode,production code,airdate,title,special?,tvrage
1,1,1,"101",24/Sep/07,"Pilot",n,"http://www.tvrage.com/Chuck/episodes/579282"
Now I am successfully converting that into SQL statments as well as another CSV file that can be inserted into my database. Into a format like so:
,1,1,1,"Pilot",'2006-10-11',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1
Using the following code
csv = """,%s,%s,%s,%s,%r,,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1""" % (showid, line[1],line[2], line[5], date(line[4]))
print>>final, csv
EDIT -
I have changed from string formatting to this:
csv = ','+showid+','+line[1]+','+line[2]+','+line[5]+','+date(line[4])+',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'
Its not much better, and I am still having trouble with some files being skipped on the parse. Not sure if its me or the CSV module.
Problem is its going through some files perfectly fine. Some CSV files it just skips, and for some I just get errors like IndexError: list index out of range
If anyone has experience with CSV files and getting them to parse correctly I would really appreciate the help.
Here is the Full Source Code: http://cl.ly/2W472g303D1p0J3S2o46
dsimport.py - http://pastie.org/3076663 CSVFileHandler.py - http://pastie.org/3076667
Thanks
Upvotes: 1
Views: 317
Reputation: 327
Nevermind all fixed. In the end I just used the excel dialect, and did the output csv with pipe lines. Either way it was quite fiddly and honestly feel like i got it to work with sheer luck.
Thanks for all the help.
Upvotes: 0
Reputation: 12798
I'm not sure exactly what are all the errors, but here are a few tips:
line
is a bit of a bad name as it isn't a string line, it's a row
or list of elements. That's what confused Tim and me as well at first sight.line
has at least 6 elements as your script requires.join
method which is awesome.Here's a small refactoring:
def processFile(row):
if len(row) < 6:
#raise Exception('too few columns')
# maybe it's better to just ignore bad rows in your case
return
items = [
'',
showid,
row[1],
row[2],
row[5],
date(row[4]),
]
res = ','.join(items)
res += ',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'
print res
print>>final, res
handler = CSVFileHandler('/Users/tharshan/WebRoot/stv/export/csv/%s-save.csv' % name)
try:
handler.process(processFile, name)
except Exception, e:
print 'Failed processing and skipping %s because of: %s' % (name, e)
final.close()
Upvotes: 1