Bilal
Bilal

Reputation: 3272

Errno 24 Too many open files when appending to a large CSV file

I'm extracting some information about genes from a database, storing it in dictionary after some modifications and append it to a CSV file.

The total number of genes is 489299, so at the end I shall have a csv file with 489299 lines, the script runs smoothly when I tested it on 10000 genes but in case of 489299 I got the error:

OSError: [Errno 24] Too many open files: 'output_agrold/Genes.csv'

Here is a snippet of the code I'm using:

# I have batches of Genes
batches = ["Gene1 Gene2...", "Gene11 Gene12..."]
for batch in batches:
    genes_batch_dico = create_genes_info_dico(batch)
    # genes_batch_dico is a List of dictionnaries which has info about genes
    # genes_batch_dico = [{info about gene1}, {info about gene2}, ...]
    for gene_dico in genes_batch_dico:
        # I get info from gene_dico : gene_id, start_pos, end_pos .....
        # here I create the CSV file
        with open(OUTPUT_PATH + '/Genes.csv', 'a') as f:
            w = csv.writer(f, delimiter=',', quoting=csv.QUOTE_ALL, quotechar='\"')
            row = [ gene_id, start_pos, end_pos .... ]
            w.writerow(row)

I checked out the number of lines I got in the CSV file and it was 52800 lines.

When I looked on the internet, I found that this error is due to opening many files at the same time (which I think I'm not doing that here, I mean I'm only apenning/appending to one file) and they suggested to modify the max number of opene files using ulimit -n NUMBER command, so I increased the open files from 1024 to 4096. but I'm still getting the same error when the number of lines reaches exactly 52800 lines.

OS : Fedora 28.

Upvotes: 0

Views: 1277

Answers (1)

Martin Evans
Martin Evans

Reputation: 46779

Assuming you are using Python 3.x, you only need to open your CSV file once for writing. Currently you are opening and closing it in append mode once for each line you write.

A better psuedo-code for what you need would be:

import csv
import os

batches = ["Gene1 Gene2...", "Gene11 Gene12..."]

with open(os.path.join(OUTPUT_PATH, 'Genes.csv'), 'w', newline='') as f:
    w = csv.writer(f, delimiter=',', quoting=csv.QUOTE_ALL, quotechar='\"')

    for batch in batches:
        genes_batch_dico = create_genes_info_dico(batch)

        for gene_dico in genes_batch_dico:
            row = [ gene_id, start_pos, end_pos .... ]
            w.writerow(row)

os.path.join() is a safer way to join parts of a file path together.

If you are still using Python 2.x, then change this line:

with open(os.path.join(OUTPUT_PATH, 'Genes.csv'), 'wb') as f:

Upvotes: 2

Related Questions