alturkovic
alturkovic

Reputation: 1120

Java finite stream termination

I am trying to make a streaming version of StringTokenizer but I am having some problems terminating the stream correctly.

public Stream<String> tokenize(String text) {
    StringTokenizer tokenizer = new StringTokenizer(text);
    return Stream.generate(tokenizer::nextToken)
        .takeWhile(s -> tokenizer.hasMoreTokens());
}

But when I run this code, the last token is missing:

Stream<String> tokens = new DefaultTokenizer().tokenize("   a b   c d\te  f\n");
tokens.forEach(System.out::println);

Results in:

a
b
c
d
e

I have tried using Stream.iterate like this: Stream.iterate(tokenizer.nextToken(), s -> tokenizer.hasMoreTokens(), s -> tokenizer.nextToken()) but the result is the same.

I am obviously terminating the stream as soon as the predicate hasMoreTokens passes, but I need to take the last element after that, how can I terminate after taking the last element?

Upvotes: 1

Views: 152

Answers (5)

Anonymous
Anonymous

Reputation: 86282

StringTokenizer is a legacy class, you may want to think otherwise. Edit: I deleted my solution from this answer since the one in the answer by magicmn is much better.

Edit: What went wrong in your code? You tried:

Stream.iterate(tokenizer.nextToken(), s -> tokenizer.hasMoreTokens(), s -> tokenizer.nextToken())

What happens here is that the stream uses s -> tokenizer.hasMoreTokens() to determine whether to include the present token, s, in the stream. At the point it does that for the final token, f, it obviously has already drawn it from the StringTokenizer, so the StringTokenizer returns false for hasMoreTokens(), and f is not included in the stream.

Upvotes: 2

magicmn
magicmn

Reputation: 1914

Just like in all the other answers I would avoid using StringTokenizer.

To avoid processing the whole string you can use Pattern.splitAsStream(String) instead of the usual split method.

So in your case something like this should have similar results as using StringTokenizer. (The string is split at any (non-empty) sequence of white-space)

Pattern.compile("\\s+").splitAsStream(text)

Upvotes: 4

DuncG
DuncG

Reputation: 15136

A small adjustment your supplier to return null at the end and takeWhile(nonNull) will allow you to continue with Tokeniser:

public Stream<String> tokenize(String text) {
    StringTokenizer tokenizer = new StringTokenizer(text);
    return Stream.generate(() -> tokenizer.hasMoreTokens() ? tokenizer.nextToken() : null)
                           .takeWhile(Objects::nonNull);
}

Upvotes: 2

Joakim Danielson
Joakim Danielson

Reputation: 51890

StringTokenizer is a legacy class that is not recommended to use anymore. Here is an example of using split() instead

public Stream<String> tokenize(String text) {
    String[] split = text.split("\\s");
    return Arrays.stream(split).map(String::trim).filter( s -> (!s.isEmpty()));
}

or even more compact

public Stream<String> tokenize(String text) {
    return Arrays.stream(text.split("\\s"))
        .map(String::trim)
        .filter( s -> (!s.isEmpty()));
}

Upvotes: 1

Ferry
Ferry

Reputation: 414

have you try this?

return Stream.generate(tokenizer::nextToken).limit((long) tokenizer.countTokens());

limit the stream by total token

Upvotes: 1

Related Questions