Reputation: 1
I am looking for some advice
I am currently creating a kmer database and looking to merge/sort and take uniq lines from 47 sample.txt.gz which are 16gb each, What would be the fastest way to do this.
i currently running this:
zcat *.merged.kmers.txt.gz | sort --parallel=48 --buffer-size= 1400G | uniq | gzip > all_unique_kmers.txt.gz
i have been running it a slurm but I wanted to know what parameters and what would someone else do, its been running 4 days!!!!
47 samples, 16gb compressed, 80gb uncompressed,
merge, sort, deduplicate
please someone help me...
Upvotes: 0
Views: 30