Smallest java structure with relatively decent contains() solution

Alright, here's the lowdown: I'm writing a class in Java that finds the Nth Hardy's Taxi number (a number that can be summed up by two different sets of two cubed numbers). I have the discovery itself down, but I am in desperate need of some space saving. To that end, I need the smallest possible data structure where I can relatively easily use or create a method like contains(). I'm not particularly worried about speed, as my current solution can certainly get it to compute well within the time restrictions.

In short, the data structure needs:

To be able to relatively simply implement a contains() method
To use a low amount of memory
To be able to store very large number of entries
To be easily usable with the primitive long type

Any ideas? I started with a hash map (because I needed to test the values the led to the sum to ensure accuracy), then moved to hash set once I guaranteed reliable answers.

Any other general ideas on how to save some space would be greatly appreciated!

I don't think you'd need the code to answer the question, but here it is in case you're curious:

public class Hardy {
//  private static HashMap<Long, Long> hm;


/**
 * Find the nth Hardy number (start counting with 1, not 0) and the numbers
 *      whose cubes demonstrate that it is a Hardy number.
 * @param n
 * @return the nth Hardy number
 */
public static long nthHardyNumber(int n) {
//      long i, j, oldValue;
    int i, j;
    int counter = 0;
    long xyLimit = 2147483647; // xyLimit is the max value of a 32bit signed number
    long sum;
//      hm = new HashMap<Long, Long>();
    int hardyCalculations = (int) (n * 1.1);
    HashSet<Long> hs = new HashSet<Long>(hardyCalculations * hardyCalculations, (float) 0.95);
    long[] sums = new long[hardyCalculations];

//      long binaryStorage, mask = 0x00000000FFFFFFFF;

        for (i = 1; i < xyLimit; i++){
            for (j = 1; j <= i; j++){
//                  binaryStorage = ((i << 32) + j);
//                  long y = ((binaryStorage << 32) >> 32) & mask;
//                  long x = (binaryStorage >> 32) & mask;

                sum = cube(i) + cube(j);
                if (hs.contains(sum) && !arrayContains(sums, sum)){
//                      oldValue = hm.get(sum);
//                      long oldY = ((oldValue << 32) >> 32) & mask;
//                      long oldX = (oldValue >> 32) & mask;
//                      if (oldX != x && oldX != y){
                    sums[counter] = sum;
                    counter++;
                    if (counter == hardyCalculations){
//                          Arrays.sort(sums);
                        bubbleSort(sums);
                        return sums[n - 1];
                    }
                } else {
                    hs.add(sum);
                }
            }
        }
        return 0;
}

private static void bubbleSort(long[] array){
    long current, next;
    int i;
    boolean ordered = false;

    while (!ordered) {
        ordered = true;
        for (i = 0; i < array.length - 1; i++){
            current = array[i];
            next = array[i + 1];
            if (current > next) {
                ordered = false;
                array[i] = next;
                array[i+1] = current;
            }
        }
    }
}

private static boolean arrayContains(long[] array, long n){
    for (long l : array){
        if (l == n){
            return true;
        }
    }
    return false;
}

private static long cube(long n){
    return n*n*n;
}
}

Upvotes: 0

Answers (3)

Viren

Reputation: 2171

this is core function to find if a given number is HR-number: it's in C but one should get the idea:

bool is_sum_of_cubes(int value)
    {
    int m = pow(value, 1.0/3);
    int i = m;
    int j = 1;

    while(j < m && i >= 0)
        {
        int element = i*i*i + j*j*j;
        if( value == element )
            {
            return true;
            }
        if(element < value)
            {
            ++j;
            }
        else
            {
            --i;
            }
        }

    return false;
    }

Upvotes: 0

npgall

Reputation: 3028

If you have an extremely large number of elements, and you effectively want an index to allow fast tests for containment in the underlying dataset, then take a look at Bloom Filters. These are space-efficient indexes whose sole purpose is to enable fast tests for containment in a dataset.

Bloom Filters are probabilistic, which means if they return true for containment, then you actually need to check your underlying dataset to confirm that the element is really present.

If they return false, the element is guaranteed not to be contained in the underlying dataset, and in that case the test for containment would be very cheap.

So it depends on the whether most of the time you expect a candidate to really be contained in the dataset or not.

Upvotes: 0

phs

Reputation: 11061

Have you considered using a standard tree? In java that would be a TreeSet. By sacrificing speed, a tree generally gains back space over a hash.

For that matter, sums might be a TreeMap, transforming the linear arrayContains to a logarithmic operation. Being naturally ordered, there would also be no need to re-sort it afterwards.

EDIT

The complaint against using a java tree structure for sums is that java's tree types don't support the k-select algorithm. On the assumption that Hardy numbers are rare, perhaps you don't need to sweat the complexity of this container (in which case your array is fine.)

If you did need to improve time performance of this aspect, you could consider using a selection-enabled tree such as the one mentioned here. However that solution works by increasing the space requirement, not lowering it.

Alternately we can incrementally throw out Hardy numbers we know we don't need. Suppose during the running of the algorithm, sums already contains n Hardy numbers and we discover a new one. We insert it and do whatever we need to preserve collection order, and so now contains n+1 sorted elements.

Consider that last element. We already know about n smaller Hardy numbers, and so there is no possible way this last element is our answer. Why keep it? At this point we can shrink sums again down to size n and toss the largest element out. This is both a space savings, and time savings as we have fewer elements to maintain in sorted order.

The natural data structure for sums in that approach is a max heap. In java there is no native implementation available, but a few 3rd party ones are floating around. You could "make it work" with TreeMap::lastKey, which will be slower in the end, but still faster than quadratic bubbleSort.

Upvotes: 0

Smallest java structure with relatively decent contains() solution

Answers (3)

Related Questions