Prince Francis
Prince Francis

Reputation: 3097

concatenate large (>100MB) multiple (say 10) csv files using python

I have 12 large csv files with same structure. I would like to combine all the csv files into single csv file. Don't repeat the headers. Now I am using shutil as follows.

import shutil
import time
csv_files = ['file1.csv', 'file2.csv', 'file3.csv', 'file4.csv', 'file5.csv', 'file6.csv']

target_file_name = 'target.csv';
start_time = time.time()
shutil.copy(csv_files[0], target_file_name)
with open(target_file_name, 'a') as out_file:
    for source_file in csv_files[1:]:
        with open(source_file, 'r') as in_file:
            in_file.readline()
            shutil.copyfileobj(in_file, out_file)
            in_file.close()
    out_file.close()
print("--- %s seconds ---" % (time.time() - start_time))

Edit

When I tried time cat file[1-4].csv > BigBoy command in the terminal I got the following output. 0.08s user 4.57s system 60% cpu 7.644 total. That is cat command took about 4.5 seconds, but Python program took 17.46 seconds. I used 4 csv files, each having 116MB size.

I would like to know, if any other methods are there in Python, to handle these scenario more efficiently. You could download large csv files from here.

Upvotes: 0

Views: 170

Answers (1)

Viktor Chukhantsev
Viktor Chukhantsev

Reputation: 61

Better use csvstack from csvkit for this. There is also a lot of other stuff to work with csv files from console.

csvstack file1.csv file2.csv ...

Upvotes: 2

Related Questions