Reputation: 99
so here is all of my code for reference.
import java.io.*;
import java.util.*;
public class Plagiarism {
public static void main(String[] args) throws Exception {
//you are not using 'myPlag' anywhere, you can safely remove it
// Plagiarism myPlag = new Plagiarism();
if (args.length == 0) {
System.out.println("Error: No files input");
System.exit(0);
}
String foo = null;
for (int i = 0; i < 2; i++) {
BufferedReader reader = new BufferedReader(new FileReader(args[i]));
foo = simplify(reader);
// System.out.print(foo);
int blockSize = Integer.valueOf(args[2]);
List<String> list = new ArrayList<String>();
for (int k = 0; k < foo.length() - blockSize + 1; k++) {
list.add(foo.substring(k, k + blockSize));
int x = 33;
int hash = 0;
for (String str: list) {
for (int o = 0; o < str.length(); o++) {
hash = 33*hash + str.charAt(o);
}
}
System.out.println(hash);
/* List<Integer> newList = new ArrayList<Integer>(list.size());
for (String myInt : list) {
newList.add(Integer.parseInt(myInt));
int x = 33;
int hash = 0;
for (int o = 0; o < newList.size(); o++) {
hash = x*hash + newList.get(o);
}
} */
}
// System.out.print(list);
}
}
public static String simplify(BufferedReader input)
throws IOException {
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = input.readLine()) != null) {
sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
}
return sb.toString();
}
}
Although I want to in particular focus on this part:
int x = 33;
int hash = 0;
for (String str: list) {
for (int o = 0; o < str.length(); o++) {
hash = 33*hash + str.charAt(o);
}
}
System.out.println(hash);
Some of the values returned are negative hash values. Why is this? Even when the block size is small (ie. 2) it is still doing it. I know it is something to do with "modulo p" perhaps? I am using Horner's polynomial method here.
I'm wondering if I could get some help on this?
Thanks guys in advance.
Upvotes: 0
Views: 5125
Reputation: 727047
Negative values are caused by integer overflow. Any integer number with the most significant bit set to 1
is interpreted as a negative number.
Hash codes do not signify anything in particular: all that is required of them is to be the same for equal values, and try to be as different as possible for non-equal values. That is why integer overflow can be safely ignored when dealing with hash codes.
Upvotes: 2
Reputation: 234875
A hash is an int
type which can take negative values. A negative value should not concern you.
When a java int
gets too big (just over 2 billion), it will wrap round to a negative value. That is what is happening here: your multiplication of 33 will eventually cause this wraparound to a negative.
Upvotes: 0