Anoop
Anoop

Reputation: 5720

Extracting Map<K, Multiset<V>> from Stream of Streams in Java 8

I have Stream of Stream of Words(This format is not set by me and cannot be changed). For ex

Stream<String> doc1 = Stream.of("how", "are", "you", "doing", "doing", "doing");
Stream<String> doc2 = Stream.of("what", "what", "you", "upto");
Stream<String> doc3 = Stream.of("how", "are", "what", "how");
Stream<Stream<String>> docs = Stream.of(doc1, doc2, doc3);

I'm trying to get this into a structure of Map<String, Multiset<Integer>> (or its corresponding stream as I want to process this further), where the key String is the word itself and the Multiset<Integer> represents the number of that word appearances in each document (0's should be excluded). Multiset is a google guava class(not from java.util.).

For example:

how   -> {1, 2}  // because it appears once in doc1, twice in doc3 and none in doc2(so doc2's count should not be included)
are   -> {1, 1}  // once in doc1 and once in doc3
you   -> {1, 1}  // once in doc1 and once in doc2
doing -> {3}     // thrice in doc3, none in others 
what  -> {2,1}   // so on
upto  -> {1}  

What is a good way to do this in Java 8 ?

I tried using a flatMap , but the inner Stream is greatly limiting the options of I have.

Upvotes: 11

Views: 1007

Answers (4)

user_3380739
user_3380739

Reputation: 1254

Here is the simple solution by AbacusUtil:

Map<String, List<Integer>> m = Stream.of(doc1, doc2, doc3)
          .flatMap(d -> d.toMultiset().stream()).collect(Collectors.toMap2());

Upvotes: 1

fps
fps

Reputation: 34460

Since you are using Guava, you could take advantage of its utilities to work with streams. Same with the Table structure. Here's the code:

Table<String, Long, Long> result =
    Streams.mapWithIndex(docs, (doc, i) -> doc.map(word -> new SimpleEntry<>(word, i)))
        .flatMap(Function.identity())
        .collect(Tables.toTable(
            Entry::getKey, Entry::getValue, p -> 1L, Long::sum, HashBasedTable::create));

Here I'm using the Streams.mapWithIndex method to assign an index to each inner stream. Within the map function, I'm transforming each word to a pair that consists of the word and the index, so that I can later know to which document the word belongs.

Then, I'm flat-mapping the pairs (word, index) of all documents to one stream, and finally, I'm collecting all the pairs to a Guava Table by means of the Tables.toTable collector. The row is the word, the column is the document (represented by the index) and the value is the count of words for each document (I'm assigning 1L to each different (word, index) pair and using Long::sum to merge collisions).

You have all the info you need in the result table, but if you still need a Map<String, Multiset<Integer>>, you could do it this way:

Map<String, Multiset<Long>> map = Maps.transformValues(
    result.rowMap(),
    m -> HashMultiset.create(m.values()));

Note: you need Guava 21 for this to work.

Upvotes: 3

Sean Van Gorder
Sean Van Gorder

Reputation: 3453

Map<String, Multiset<Integer>> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(Collectors.groupingBy(Multiset.Entry::getElement,
                Collectors.mapping(Multiset.Entry::getCount,
                        Collectors.toCollection(HashMultiset::create))));

// {upto=[1], how=[1, 2], doing=[3], what=[1, 2], are=[1 x 2], you=[1 x 2]}

Multiset is useful for getting the word count, but not really necessary for storing the counts. If you're fine with Map<String, List<Integer>>, just replace the last line with Collectors.toList())));.

Or, since you're using Guava anyway, why not a ListMultimap?

ListMultimap<String, Integer> result = docs
        .map(s -> s.collect(Collectors.toCollection(HashMultiset::create)))
        .flatMap(m -> m.entrySet().stream())
        .collect(ArrayListMultimap::create,
                (r, e) -> r.put(e.getElement(), e.getCount()),
                Multimap::putAll);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

Upvotes: 3

Eugene
Eugene

Reputation: 120968

 Map<String, List<Long>> map = docs.flatMap(
            inner -> inner.collect(
                    Collectors.groupingBy(Function.identity(), Collectors.counting()))
                    .entrySet()
                    .stream())
            .collect(Collectors.groupingBy(
                    Entry::getKey,
                    Collectors.mapping(Entry::getValue, Collectors.toList())));

System.out.println(map);

// {upto=[1], how=[1, 2], doing=[3], what=[2, 1], are=[1, 1], you=[1, 1]}

Upvotes: 10

Related Questions