Reputation: 121
I have a lot of csv file made by 3 columns like this:
fac simile of files: file_1, file_4, file_5, file_7, etc
(All the same file name, != only the final numbers at the end. Them are not consecutive tho as in the
example)
the inside
['357', '29384', '0.0031545741324921135']
['357', '29389', '0.0031545741324921135']
['357', '29526', '0.0368574903844921735']
['357', '35516', '0.0036775741324564665']
['357', '35551', '0.0023554341325646453']
['357', '35639', '0.0064467781324766535']
['357', '36238', '0.0067543874132467543']
['357', '37162', '0.0031545746577921135']
Let's name the 3 columns [a,b,c]. I'd like to sort them by c, so the last column. I have to read all the files and sort all the content ina huge one. I can use a pickle for example.
My first idea was:
import csv
from operator import itemgetter
fn = 1
# N as the max number in the really last file
while fn < N:
newfile = open("file_{fn}.csv","r")
reader = csv.reader(newfile)
file = open("BigSortedFile.csv","w")
for line in sorted(reader, key=itemgetter(2)):
file.write(line)
fn = fn +1
file.close()
#after the loop I think I have to sort again the BigSortedFile.
But it's not working because I need a string, not a line. How can I do the whole process?
Upvotes: 0
Views: 801
Reputation: 51643
To sort all lines you need to read them all into one datastructure, then write them again.
The csv module needs you to open files with newline=""
to work properly.
When you use a csv.reader
to read, you can also use a csv.writer
to write your data:
import csv
from operator import itemgetter
fn = 1 # first file has number 1 in filename
N = 42 # last numer in file-names is 42
data = []
while fn < N:
with open("file_{fn}.csv", "r", newline="") as newfile:
reader = csv.reader(newfile)
data.extend(list(reader))
data.sort(key=itemgetter(2))
with open("BigSortedFile.csv", "w", newline="") as bf:
writer = csv.writer(bf)
writer.writerows(data)
Upvotes: 1