Reputation: 1120
I am trying to make a streaming version of StringTokenizer
but I am having some problems terminating the stream correctly.
public Stream<String> tokenize(String text) {
StringTokenizer tokenizer = new StringTokenizer(text);
return Stream.generate(tokenizer::nextToken)
.takeWhile(s -> tokenizer.hasMoreTokens());
}
But when I run this code, the last token is missing:
Stream<String> tokens = new DefaultTokenizer().tokenize(" a b c d\te f\n");
tokens.forEach(System.out::println);
Results in:
a
b
c
d
e
I have tried using Stream.iterate
like this: Stream.iterate(tokenizer.nextToken(), s -> tokenizer.hasMoreTokens(), s -> tokenizer.nextToken())
but the result is the same.
I am obviously terminating the stream as soon as the predicate hasMoreTokens
passes, but I need to take the last element after that, how can I terminate after taking the last element?
Upvotes: 1
Views: 152
Reputation: 86282
StringTokenizer
is a legacy class, you may want to think otherwise. Edit: I deleted my solution from this answer since the one in the answer by magicmn is much better.
Edit: What went wrong in your code? You tried:
Stream.iterate(tokenizer.nextToken(), s -> tokenizer.hasMoreTokens(), s -> tokenizer.nextToken())
What happens here is that the stream uses s -> tokenizer.hasMoreTokens()
to determine whether to include the present token, s
, in the stream. At the point it does that for the final token, f
, it obviously has already drawn it from the StringTokenizer
, so the StringTokenizer
returns false
for hasMoreTokens()
, and f
is not included in the stream.
Upvotes: 2
Reputation: 1914
Just like in all the other answers I would avoid using StringTokenizer
.
To avoid processing the whole string you can use Pattern.splitAsStream(String)
instead of the usual split
method.
So in your case something like this should have similar results as using StringTokenizer
. (The string is split at any (non-empty) sequence of white-space
)
Pattern.compile("\\s+").splitAsStream(text)
Upvotes: 4
Reputation: 15136
A small adjustment your supplier to return null at the end and takeWhile(nonNull) will allow you to continue with Tokeniser:
public Stream<String> tokenize(String text) {
StringTokenizer tokenizer = new StringTokenizer(text);
return Stream.generate(() -> tokenizer.hasMoreTokens() ? tokenizer.nextToken() : null)
.takeWhile(Objects::nonNull);
}
Upvotes: 2
Reputation: 51890
StringTokenizer is a legacy class that is not recommended to use anymore. Here is an example of using split()
instead
public Stream<String> tokenize(String text) {
String[] split = text.split("\\s");
return Arrays.stream(split).map(String::trim).filter( s -> (!s.isEmpty()));
}
or even more compact
public Stream<String> tokenize(String text) {
return Arrays.stream(text.split("\\s"))
.map(String::trim)
.filter( s -> (!s.isEmpty()));
}
Upvotes: 1
Reputation: 414
have you try this?
return Stream.generate(tokenizer::nextToken).limit((long) tokenizer.countTokens());
limit the stream by total token
Upvotes: 1