user3364788
user3364788

Reputation: 99

Java negative value hash codes

so here is all of my code for reference.

import java.io.*;
import java.util.*;

public class Plagiarism {

    public static void main(String[] args) throws Exception {
        //you are not using 'myPlag' anywhere, you can safely remove it
//      Plagiarism myPlag = new Plagiarism();

        if (args.length == 0) {
            System.out.println("Error: No files input");
            System.exit(0);
        }

        String foo = null;
        for (int i = 0; i < 2; i++) {
            BufferedReader reader = new BufferedReader(new FileReader(args[i]));
            foo = simplify(reader);
            // System.out.print(foo);
            int blockSize = Integer.valueOf(args[2]);

            List<String> list = new ArrayList<String>();
            for (int k = 0; k < foo.length() - blockSize + 1; k++) {
                list.add(foo.substring(k, k + blockSize));

                int x = 33;
                int hash = 0;
                for (String str: list) {
                    for (int o = 0; o < str.length(); o++) {
                        hash = 33*hash + str.charAt(o);
                    }
                }
                System.out.println(hash);

                /* List<Integer> newList = new ArrayList<Integer>(list.size());
                for (String myInt : list) {
                    newList.add(Integer.parseInt(myInt));

                    int x = 33;
                    int hash = 0;
                    for (int o = 0; o < newList.size(); o++) {
                        hash = x*hash + newList.get(o);
                    }
                } */


            }
            // System.out.print(list);
        }




    }

    public static String simplify(BufferedReader input)
            throws IOException {

        StringBuilder sb = new StringBuilder();
        String line = null;
        while ((line = input.readLine()) != null) {
            sb.append(line.replaceAll("[^a-zA-Z]", "").toLowerCase());
        }
        return sb.toString();
    }
}

Although I want to in particular focus on this part:

int x = 33;
                    int hash = 0;
                    for (String str: list) {
                        for (int o = 0; o < str.length(); o++) {
                            hash = 33*hash + str.charAt(o);
                        }
                    }
                    System.out.println(hash);

Some of the values returned are negative hash values. Why is this? Even when the block size is small (ie. 2) it is still doing it. I know it is something to do with "modulo p" perhaps? I am using Horner's polynomial method here.

I'm wondering if I could get some help on this?

Thanks guys in advance.

Upvotes: 0

Views: 5125

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 727047

Negative values are caused by integer overflow. Any integer number with the most significant bit set to 1 is interpreted as a negative number.

Hash codes do not signify anything in particular: all that is required of them is to be the same for equal values, and try to be as different as possible for non-equal values. That is why integer overflow can be safely ignored when dealing with hash codes.

Upvotes: 2

Bathsheba
Bathsheba

Reputation: 234875

A hash is an int type which can take negative values. A negative value should not concern you.

When a java int gets too big (just over 2 billion), it will wrap round to a negative value. That is what is happening here: your multiplication of 33 will eventually cause this wraparound to a negative.

Upvotes: 0

Related Questions