Reputation: 3272
I'm extracting some information about genes from a database, storing it in dictionary after some modifications and append it to a CSV file.
The total number of genes is 489299, so at the end I shall have a csv file with 489299 lines, the script runs smoothly when I tested it on 10000 genes but in case of 489299 I got the error:
OSError: [Errno 24] Too many open files: 'output_agrold/Genes.csv'
Here is a snippet of the code I'm using:
# I have batches of Genes
batches = ["Gene1 Gene2...", "Gene11 Gene12..."]
for batch in batches:
genes_batch_dico = create_genes_info_dico(batch)
# genes_batch_dico is a List of dictionnaries which has info about genes
# genes_batch_dico = [{info about gene1}, {info about gene2}, ...]
for gene_dico in genes_batch_dico:
# I get info from gene_dico : gene_id, start_pos, end_pos .....
# here I create the CSV file
with open(OUTPUT_PATH + '/Genes.csv', 'a') as f:
w = csv.writer(f, delimiter=',', quoting=csv.QUOTE_ALL, quotechar='\"')
row = [ gene_id, start_pos, end_pos .... ]
w.writerow(row)
I checked out the number of lines I got in the CSV file and it was 52800 lines.
When I looked on the internet, I found that this error is due to opening many files at the same time (which I think I'm not doing that here, I mean I'm only apenning/appending to one file) and they suggested to modify the max number of opene files using ulimit -n NUMBER
command, so I increased the open files
from 1024 to 4096. but I'm still getting the same error when the number of lines reaches exactly 52800 lines.
OS : Fedora 28.
Upvotes: 0
Views: 1277
Reputation: 46779
Assuming you are using Python 3.x, you only need to open your CSV file once for writing. Currently you are opening and closing it in append mode once for each line you write.
A better psuedo-code for what you need would be:
import csv
import os
batches = ["Gene1 Gene2...", "Gene11 Gene12..."]
with open(os.path.join(OUTPUT_PATH, 'Genes.csv'), 'w', newline='') as f:
w = csv.writer(f, delimiter=',', quoting=csv.QUOTE_ALL, quotechar='\"')
for batch in batches:
genes_batch_dico = create_genes_info_dico(batch)
for gene_dico in genes_batch_dico:
row = [ gene_id, start_pos, end_pos .... ]
w.writerow(row)
os.path.join()
is a safer way to join parts of a file path together.
If you are still using Python 2.x, then change this line:
with open(os.path.join(OUTPUT_PATH, 'Genes.csv'), 'wb') as f:
Upvotes: 2