Evgenij Reznik
Evgenij Reznik

Reputation: 18614

Group by main words

Consider the following TreeSet:

This set has 2 main words blue and red along with different key words.
I need to group by those main words so that I get a list with all possible key words. Something like:

I think the steps should be as follows:

  1. Detect main word
    • the list is in alphabetical order, so the main words occur next to each other
    • it can only be the first word in an entry
    • it consists of at least 3 letters
    • not every entry has a main word (entries some words and useless words should be just skipped)
  2. Group by main word
    • some kind of "merging": take all entries with the same main word and remove it from every entry, so that only the remaining key words are left
      • blue flower
      • blue big car
      • blue hat 123
    • in this case the key words: flower, big, car, hat, 123 are left

Could somebody please give me a suggestion how to accomplish it and what I need for that?

Upvotes: 1

Views: 58

Answers (1)

Stewart
Stewart

Reputation: 18303

I don't think you need regex. Split each string on whitespace using String.split(" "), and then examine the first item to compare it your list of "main" words.

TreeSet<String> originalSet = // as per question
List<String> mainWords = Arrays.asList("blue", "red");
Map<String, Set<String>> words = new HashMap<>();
for(String mainWord : mainWords) {
    words.put(mainWord, new HashSet<String>());
}
for(String line : originalSet) {
    String[] items = line.split(" ");
    if(words.keySet().contains(items[0])) {
        for(int i = 1 ; i < items.length ; i++) {
            words.get(items[0]).add(items[i]);
        }
    }
}

Upvotes: 1

Related Questions