Need Java advice for handling billions of records present in unindexed files

Question

I have 4 big .tab files one of which is 6GB and others are 10GB each. The 6GB file contains information about animals of a certain region, and the other 3 files contains other vital information that is related to each animal present in the 6GB file.

I am required to write a program that produces small data sets from these big files based on some user inputs.

I read animal's data from 6GB file line by line and if they pass certain criteria they are stored in an ArrayList, otherwise omitted.

And now for each animal in the ArrayList I need to go through the other 3 files over and over again in order to further filter them and finally produce the small data set which the used needs. But as of now it takes about 7 hours of run-time to fetch a small data-set of 1500 animal records. Main culprit is that for each animal I select into the ArrayList I need to look up the other 3 files multiple times at different steps of data extraction process

I have already written code in Java for this. But the program is incredibly slow. I have used buffered readers to access these files. But I am looking for other tools and techniques that I can use in Java and make this efficient and usable system.

I have considered pushing data in a SQL or NoSQL database, but I need expert advice to guide me in the right direction before I do something to improve the performance.

Thanks in advance

Need Java advice for handling billions of records present in unindexed files

Answers (1)

Related Questions