Pieter-Paul Kramer
Pieter-Paul Kramer

Reputation: 198

Can I check if Java 8 stream contains at least n elements

I have a Java 8 stream from which I want to (uniformly) randomly select an element. The stream can contain anywhere from zero to tens of thousands of elements.

I have implemented an algorithm that selects one using a MapReduce-like pattern, but for the very small streams it would probably be more efficient to just collect the items into a List and return one with a random index. For that I have to count them, however. Streams do have a count() method but that counts them all, I'm not really interested in the actual count, all I care about is whether it contains more than a to-be-determined number. Does anyone know if such a method exists? I can't find it but there might be something I'm overlooking or some clever trick for finding it anyway.

P.S.: I'm aware that sometimes it's not necessary to optimize code; but I would like to try it nonetheless, just for the experience. I'm a student.

P.P.S.: I've copied my algorithm here, in case anyone's interested (or wants to look for bugs, I haven't tested it yet ;-)

stream
    .parallel()
    .map(t -> new Pair<T, Integer>(t, 1))
    .reduce((Pair<T, Integer> t, Pair<T, Integer> u) -> {
        if (rand.nextDouble() <= (t.getValue1() / (double) (t.getValue1() + u.getValue1()))) {
            return new Pair<>(t.getValue0(), t.getValue1() + u.getValue1());
        } else {
            return new Pair<>(u.getValue0(), t.getValue1() + u.getValue1());
        }
    })
    .map(t -> t.getValue0());

(The pairs are from org.javatuples, now that Java supports functional programming-like interfaces the lack of tuples does become a bit painful).

Upvotes: 13

Views: 3603

Answers (3)

barfuin
barfuin

Reputation: 17494

The original question has already been answered, I believe, but I keep landing here while googling for "java stream at least n elements" or similar, so maybe this will still be helpful to some.

What helped me was the limit() method. We set it to the expected minimum, then count all elements. This will stop counting once the limit is reached. Here's a full example:

class Scratch
{
    public static void main(String[] args)
    {
        List<Integer> list1 = Arrays.asList(1, 2, 3);
        List<Integer> list2 = Arrays.asList(1, 2, 3, 4);

        System.out.println(streamContainsAtLeastNElements(list1.stream(), 4));
        // --> false
        System.out.println(streamContainsAtLeastNElements(list2.stream(), 4));
        // --> true
    }

    public static boolean streamContainsAtLeastNElements(Stream<?> stream, long minCount)
    {
        return stream.limit(minCount).count() == minCount;
    }
}

Note that it will consume your stream though. Also, it may still be slow if your stream implements some complex ordering routines. In that case, consider adding unordered().

Upvotes: 1

user2717921
user2717921

Reputation: 42

Your code does not return element from uniform distribution. It depends on the order, that stream provides elements to reduce method. In general case you can't consider that the order will not be the special one. Solving your task: if you have enough memory, it is possible to write RandomComparator (that saves previous results in Map), sort your stream with this comparator and get first element (don't use findAny). If stream is too large, it is possible to sample it with RandomFilter.

btw, if you have SIZED flag in your stream, task is trivial. Just get size, generate random index and make spip :)

Upvotes: 1

Daniel Hristov
Daniel Hristov

Reputation: 56

I suggest trying go get this info from the source of data for the stream. Where do you get the data for the stream from? If the source (some collection for example) can give you the number of elements you're set. If it's some producer function check what it does and whether it's possible to estimate the size upfront.

The moment I type "stream" I normally start thinking of a "recipe" of what do I want to do with this data, rather than the actual data. I think that's close to the way streams are designed (which tells why they don't provide way to count elements).

Best regards, Dido

Upvotes: 0

Related Questions