Amir Sa
Amir Sa

Reputation: 251

Regular expression to capture repeated word (more than one 2 repetition in text)

I would like to write a program in JAVA, to capture words which repeated more than 2 times in a text content.

for instance: the blue book over The red pen is the biggest book I ever seen. Result: the:3

What can be the proper regular expression pattern for this matter?

Upvotes: 0

Views: 264

Answers (2)

anubhava
anubhava

Reputation: 785186

Rather than trying to solve this problem by regex I would suggest following algorithm:

  1. Split your sentence into words (using white spaces) and store their lowercase version in a List<String>.
  2. Declare a map as HashMap<String, Integer>.
  3. Iterate over your words List and keep storing in the map.
  4. If Map didn't have an entry of the word then key=word, value=1
  5. Otherwise increment value by 1 giving you frequency of each word.
  6. Every time frequency goes above 2 store that word in your output HashSet<String>
  7. At the end of loop just print HashSet<String>

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148930

There is no need for regexes, unless for splitting a text in words. Next you just have to use a Map, with the key being the word, and value being the number or repetitions.

When done, you just scan the Map to find the most repeated word.

Upvotes: 0

Related Questions