Don
Don

Reputation: 153

Stream of Strings isn't sorted?

i would like to find the set of all words in a file.This set should be sorted. Upper and Lower Case doesn't matter. Here is my approach:

public static Set<String> setOfWords(String fileName) throws IOException {

    Set<String> wordSet;
    Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));

    wordSet = stream
                .map(line -> line.split("[ .,;?!.:()]"))
                .flatMap(Arrays::stream)
                .sorted()
                .map(String::toLowerCase)
                .collect(Collectors.toSet());
    stream.close();
    return wordSet;
}

Test file:

This is a file with five lines.It has two sentences, and the word file is contained in multiple lines of this file. This file can be used for testing?

When printing the set, i get the following output:

Set of words: 
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word

Can anybody tell me, why the set is not sorted in it's natural order(for Strings lexiographic)?

Thanks in advance

Upvotes: 4

Views: 2379

Answers (2)

Eran
Eran

Reputation: 393781

Since the ordering is case sensitive, you should map to lower case before sorting.

Besides that, you should collect the output into an ordered collection such as a List or some SortedSet implementation (though if you use a SortedSet there's no need to execute sorted(), since the Set will be sorted anyway).

A List output :

List<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .sorted()
            .collect(Collectors.toList());

EDIT:

As commented by Hank, if you want to eliminate duplicates in the output Collection, a List won't do, so you'll have to collect the elements into a SortedSet implementation.

A SortedSet output :

Set<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .collect(Collectors.toCollection(TreeSet::new));

Upvotes: 5

Sleiman Jneidi
Sleiman Jneidi

Reputation: 23329

You can use a sorted collection like a TreeSet using String.CASE_INSENSITIVE_ORDER as a Comparator

Set<String> set = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));

Or you can sort the elements using a case insensitive comparator and collect it into a collection that maintains insertion order.

List<String> list = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .sorted(String::compareToIgnoreCase)
            .distinct()
            .collect(Collectors.toList());

Upvotes: 7

Related Questions