Reputation: 169
I have written the following code to take a large csv file, and split it into multiple csv files based on a particular word in a column. The original csv file has some fields that are strings, and they have quotes around them.
For example:
Field1,Field2,Field3,Field4
1,2,"red",3
1,4,"red",4
3,4,"blue",4
etc.
My code splits the file into separate csvs based on Field4.
My output looks like this:
3.csv
Field1,Field2,Field3,Field4
1,2,red,3
4.csv
Field1,Field2,Field3,Field4
1,4,red,4
3,4,blue,4
I want my output to maintain quotes around the strings in field 3. The files are fed into a piece of software that only works if strings have quotes around them, which is quite annoying.
My current code looks like this:
import csv
#Creates empty set - this will be used to store the values that have already been used
newfilelist = set()
#Opens the large csv file in "read" mode
with open('File.csv', 'r') as csvfile:
#Read the first row of the large file and store the whole row as a string (headerstring)
read_rows = csv.reader(csvfile)
headerrow = next(read_rows)
headerstring=','.join(headerrow)
for row in read_rows:
#Store the whole row as a string (rowstring)
rowstring=','.join(row)
#Takes Field 4
newfilename = (row[3])
#This basically makes sure it is not looking at the header row.
if newfilename != "field4":
#If the newfilename is not in the newfilename set, add it to the list and create new csv file with header row.
if newfilename not in newfilelist:
newfilelist.add(newfilename)
with open('//output/' +str(newfilename)+'.csv','a') as f:
f.write(headerstring)
f.write("\n")
f.close()
#If the newfilename is in the newfilelist set, append the current row to the existing csv file.
else:
with open('//output/' +str(newfilename)+'.csv','a') as f:
f.write(rowstring)
f.write("\n")
f.close()
Can anybody advise me how to get the quotes around the strings? Unfortunately the software that uses my files requires them to be in this format!
Upvotes: 8
Views: 9199
Reputation: 2609
CSVwriter might be overkill for what you are trying to do. If you want the entire line unchanged, just write the entire line.
#Creates empty array - this will be used to store the values that have already been used
newfilelist = {}
#Opens the large csv file in "read" mode
with open('File.csv, 'r') as csvfile:
#Read the first row of the large file and store the whole row as a string (headerstring)
headerstring = csvfile.readline()
for row in csvfile.readlines():
#Takes Field 4
newfilename = row.split(',')[3].strip('"')
#If the newfilename is not in the newfilename set, add it to the list and create new csv file with header row.
if newfilename not in newfilelist:
newfilelist[newfilename] = open('//output/' +str(newfilename)+'.csv','w'): #open a file and store the file reference in an dictionary
newfilelist[newfilename].write(headerstring)
newfilelist[newfilename].write(row) # Write out a row to an existing file
#Close all open files
for k in newfilelist.keys():
newfilelist[k].close()
Upvotes: 0
Reputation: 798536
Pass quoting=csv.QUOTE_NONNUMERIC
to csv.writer()
.
Upvotes: 17