Reputation: 31
I have several big data(1G each) files containing person's information(just name and phone numbers). The format is clear and flexible. The problem is load and process them. Process one of them maybe still feasible but if I want to process them all under certain directory, things get tricky. When I use
File file = chooser.getSelectFile();
and get a directory, I think the next step is put the file in the file array:
File[] files = file.ListFile();
But will that cause a problem? Since each file is 1G, the VM's memory wont ablt to hold all these files together. In order to search them later, I think I may want to sort them first. How can I sort these individual files? Since the total size is so big, the idea: put them into files like A.txt, B.txt which start with letter A and B is not sufficient.
Upvotes: 3
Views: 294
Reputation: 310915
A File just represents the file name, not the contents. Unless you have many thousands of files per directory, you haven't done anything to use much memory yet.
Don't try to process these files by loading each one entirely into memory though.
Upvotes: 1
Reputation: 2828
When you have a lot of data that your primary memory cannot hold, you start using secondary memory. So the question boils down to what you would want to do with the name and phone numbers.
Lets say you have 100 files with name and phone numbers randomly placed and you need your program to find a phone number for a name quickly. The ideal way would be to create a HashMap with the name and phone number as value. But since your memory cannot hold the entire contents, you might want to consider storing the data in a better way in the secondary memory. For example, all names that start with A, store them in a file called A.txt and all names starting with B will be stored in B.txt and so on.
Now when you want to search for a name, find the starting character and look in the corresponding file.
Upvotes: 0
Reputation: 13481
A File
only stores a file handle, not the contents of the file itself. It would only be when you open that file and read its contents that the contents of the file will end up on the Java heap.
I'd suggest using InputStream
s and processing the file contents as you read them, rather than loading the whole file into memory and then processing it.
Upvotes: 0