Reputation: 389
The files under folder1 and folder2 will have same names and I want 2 compare those files. Am struck with this. Is there any JAVA API for doing this comparison. The file sizes may be huge
Example:
folder1/file1
----------
kushi,metha,2
kushi,barun,1
arun,mital,3
folder2/file1
----------
arun,mital,3
kushi,metha,2
sheetal,kumar,3
kushi,barun,1
The comparison of file1 and file2 should return "sheetal kumar 3" I tried googling but not able to find anything useful.
Upvotes: 1
Views: 871
Reputation: 1505
I encountered the same problem, and write a comparison function:
/**
* Compare two sequences of lines without considering order.
* <p>
* Input parameter will not be modified.
*/
public static <T> boolean isEqualWithoutOrder(final T[] lines1, final T[] lines2) {
if (lines1 == null && lines2 == null) return true;
if (lines1 == null) return false;
if (lines2 == null) return false;
if (lines1.length != lines2.length) return false;
final int length = lines1.length;
int equalCnt = 0;
final boolean[] mask = new boolean[length];
Arrays.fill(mask, true);
for (int i = 0; i < lines2.length; i++) {
final T line2 = lines2[i];
for (int j = 0; j < lines1.length; j++) {
final T line1 = lines1[j];
if (mask[j] && Objects.equal(line1, line2)) {
equalCnt++;
mask[j] = false;
//if two equal lines is found, more subsequent equal lines are speculated
while (j + 1 < length && i + 1 < length &&
Objects.equal(lines1[j + 1], lines2[i + 1])) {
equalCnt++;
mask[j + 1] = false;
j++;
i++;
}
break;
}
}
if (equalCnt < i) return false;
}
return equalCnt == length;
}
Common collections may be slow, speed comparison:
//lines1: Seq[String], lines2: Seq[String] of 100k lines of equal Random String but without ordering.
FastUtils.isEqualWithoutOrder(lines1.toArray, lines2.toArray) //97 ms
lines1.sorted == lines2.sorted //836 ms
Time measured in hot sbt environment.
(Disclaimer: I only did some basic test against this function)
Upvotes: 0
Reputation: 415
I know this is not a pure java solution, but if you have access to a *nix box :
sort file1 > sorted1; sort file2 > sorted2;comm -3 sorted1 sorted2;
Would give you exactly what you need.
And then take a look at this question on how you can run shell scripts from java.
EDIT:
What I am trying to say is that for you to compute the diff there are 2 steps :
Upvotes: 2
Reputation: 52185
Depending on what you mean by huge, you could use a HashSet
to first go through one file and add each line to the hash set, then, go through the other file and from the hash set, remove the lines you are now reading from the other file. This assumes that each line is unique.
Upvotes: 0