Reputation: 153
i would like to find the set of all words in a file.This set should be sorted. Upper and Lower Case doesn't matter. Here is my approach:
public static Set<String> setOfWords(String fileName) throws IOException {
Set<String> wordSet;
Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));
wordSet = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.sorted()
.map(String::toLowerCase)
.collect(Collectors.toSet());
stream.close();
return wordSet;
}
Test file:
This is a file with five lines.It has two sentences, and the word file is contained in multiple lines of this file. This file can be used for testing?
When printing the set, i get the following output:
Set of words:
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word
Can anybody tell me, why the set is not sorted in it's natural order(for Strings lexiographic)?
Thanks in advance
Upvotes: 4
Views: 2379
Reputation: 393781
Since the ordering is case sensitive, you should map to lower case before sorting.
Besides that, you should collect the output into an ordered collection such as a List
or some SortedSet
implementation (though if you use a SortedSet
there's no need to execute sorted()
, since the Set
will be sorted anyway).
A List
output :
List<String> wordSet = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.map(String::toLowerCase)
.sorted()
.collect(Collectors.toList());
EDIT:
As commented by Hank, if you want to eliminate duplicates in the output Collection
, a List
won't do, so you'll have to collect the elements into a SortedSet
implementation.
A SortedSet
output :
Set<String> wordSet = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.map(String::toLowerCase)
.collect(Collectors.toCollection(TreeSet::new));
Upvotes: 5
Reputation: 23329
You can use a sorted collection like a TreeSet
using String.CASE_INSENSITIVE_ORDER
as a Comparator
Set<String> set = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));
Or you can sort the elements using a case insensitive comparator and collect it into a collection that maintains insertion order.
List<String> list = stream
.map(line -> line.split("[ .,;?!.:()]"))
.flatMap(Arrays::stream)
.sorted(String::compareToIgnoreCase)
.distinct()
.collect(Collectors.toList());
Upvotes: 7