Getting the nth element of a set several times in Java

Question

Is there any efficient way for getting the nth element of a set in Java? I know 2 ways of doing it: - By iterating through it until I reach the required element - By converting it to an ArrayList and getting the elements from that ArrayList The question is that is there any other way to get the nth element of it quickly. I would mainly need a feature like that for the TreeSets.

EDIT: For example if I want to select 1000 random elements from a 10 000 000 element long treemap or treeset, very frequently (i.e. every 2-3 seconds), then cloning it to an arraylist all the time is very inefficient, and iterating through so many elements is also inefficient.

leeyuiwah · Accepted Answer

If you are sure that you need n elements from random positions in the set (kind of like a statistical sampling), then you may want to consider just iterating through the set once and pick up the samples, by the desired probability, as you iterate through the set. This way is more efficient as you will iterate through the set only once.

The following program demonstrates the idea:

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Random;
import java.util.Set;
import java.util.TreeSet;

public class SamplingFromSet {

    public static void main(String[] args) {
        Set population = new TreeSet<>();

        /*
         * Populate the set
         */
        final int popSize = 17;
        for (int i=0; i sample 
            = sampleFromPopulation(population, 3 /*sampleSize */);

        System.out.println("population is");
        System.out.println(population.toString());
        System.out.println("sample is");
        System.out.println(sample.toString());

    }


    /**
     * Pick some samples for a population
     * @param population
     * @param sampleSize - number of samples
     * @return
     */
    private static  
    List sampleFromPopulation(Set population
                                    , int sampleSize) {
        float sampleProb = ((float) sampleSize) / population.size();
        List sample = new ArrayList<>();
        Iterator iter = population.iterator();
        while (iter.hasNext()) {
            T element = iter.next();
            if (random.nextFloat()



Output of this program:

population is
[-1488564139, -1510380623, -1980218182, -354029751, -564386445, -57285541, -753388655, -775519772, 1538266464, 2006248253, 287039585, 386398836, 435619764, 48109172, 580324150, 64275438, 860615531]
sample is
[-57285541, -753388655, 386398836]


Update

The above program, however, has a caveat -- since the picking up
of samples in that one walk through the set is done by probability,
the returned sample may, depending on your luck of the day, 
have fewer or more samples than specified.
This problem, however, can remedied with a slight change of procedure,
which uses a slightly different method signature:

/**
 * Pick some samples from a population
 * @param population
 * @param sampleSize - number of samples
 * @param exactSize - a boolean to control whether or not
 *   the returned sample list must be of the exact size as
 *   specified.
 * @return
 */
private static  
List sampleFromPopulation(Set population
                                , int sampleSize
                                , boolean exactSize);


Prevention of oversampling

In the one iteration through the population, we over sample a bit,
and then at the end we drop some samples if we do have too many.  

Prevention of undersampling

Note also that, even with oversampling, there is a non-zero probability
that, at the end of the one iteration through the population, we still
get less samples than desired.  If that happen (unlikely), we will recursively calling
the same method again as a second try.  (This recursion has a probability approaching one
to terminate because it is very unlike that, in repeated recursive
call into the method, we consistently get undersampling.)

The following code implements the new sampleFromPopulation() method:

private static  
List sampleFromPopulation(Set population
                                , int sampleSize
                                , boolean exactSize) {
    int popSize = population.size();
    double sampleProb = ((double) sampleSize) / popSize;

    final double    OVER_SAMPLING_MULIT = 1.2;
    if (exactSize) {
        /*
         * Oversampling to enhance of chance of getting enough
         * samples (if we then have too many, we will drop them 
         * later)
         */
        sampleProb = sampleProb * OVER_SAMPLING_MULIT;
    }
    List sample = new LinkedList<>(); // linked list for fast removal
    Iterator iter = population.iterator();
    while (iter.hasNext()) {
        T element = iter.next();
        if (random.nextFloat()0) {
        Set indexesToRemoveAsSet = new HashSet<>();
        for (int i=0; i indexesToRemoveAsList 
            = new ArrayList<>(indexesToRemoveAsSet);
        Collections.sort(indexesToRemoveAsList
                , (i1, i2) -> i2.intValue() - i1.intValue()); // desc order  
        /*
         * Now we drop from the tail of the list
         */
        for (Integer index : indexesToRemoveAsList) {
            sample.remove((int) index); // remove by index (not by element)
        }
        return sample;
    } else { 
        /*
         * we were unluckly that we oversampling we still
         * get less samples than specified, so here we call
         * this very same method again recursively
         */
        return sampleFromPopulation(population, sampleSize, exactSize);
    }
}

Getting the nth element of a set several times in Java

Answers (2)

Related Questions