Webman
Webman

Reputation: 1554

Double while loop to read a text file

Is it possible to read a text file with a double while/for loop?

I'd like to do something like this:

for( String row1 = 0; row1 < file.length; row1++ ) {

   for( String row2 = row1 + 1; row2 < file.length; row2++ ){

       if( file[row1] == file[row2] ){
            // other code
       }

   }

}

I need a double loop because I have to find a duplicate row in the file with 2.500.000 rows. I can't use a Set to save the rows because the heap size is insufficient and if I try to increase it, I get this error: "Error occurred during initialization of VM Could not reserve enough space for object heap Could not create the Java virtual machine.." (I've got a Windows 7 64 bit and 8 GB Ram)

Thanks in advance

Upvotes: 2

Views: 1174

Answers (3)

Jeff Ferland
Jeff Ferland

Reputation: 18282

Based upon your question and the comments following it, your goal is to find duplicates in a large file. Worst-case for this is O(N^2) -- comparing every object to every other object. The better solution is to sort them first.

Because the file is too large for you to allocate enough memory to sort it in memory, you need to use a different approach. How could the UNIX sort command sort a very large file? provides some details of an implmentation. The generic problem is "external sorting".

The pseudo-code from the Wikipedia page should be suitably easy to follow and implement. If you're feeling really brave, you can use tackle the algorithmic details from the Unix sort command and the corresponding pages of the Knuth book.

... and finally, some Googled code that I haven't really reviewed or tested:

Upvotes: 1

G_H
G_H

Reputation: 11999

You can do that. But the performance is O(n²), which isn't too good. Also, beware of using ==. This checks if the two instances are the same object, it's not the same as using equals. Maybe you can calculate a hash for each row and use that to sniff out possible collisions.

Upvotes: 0

Moishe Lettvin
Moishe Lettvin

Reputation: 8471

Sort the original file (you can split it up and use merge sort). Then find dups iteratively (if prev == cur, you've found a dup).

Upvotes: 6

Related Questions