md shoaib
md shoaib

Reputation: 135

regular expression to find repeated words in a sentence

I was trying to write a regular expressions to find repeated words in a sentence. Well, I tried using this expression:

\b(\w+)\b.*?\1

to select 3x 'hello', 2x 'are', and 2x 'you' from the sentence "Hello how in the Hello world are you ? are you okay? Hello", which I know is clearly wrong, since it takes entire group of words into consideration instead of one particular word!
So could you correct my expression or come up with your own solution?
I'm using Matcher class to try to find out number of occurrence of a given word using count variable in while loop of matcher.find().

Upvotes: 1

Views: 2317

Answers (2)

Sweeper
Sweeper

Reputation: 271030

Regex isn't really suitable for a job like this. Regex don't tend to count things. You can do this with the help of regex, but it's very difficult, if not impossible, to do this with only regex.

Here's my attempt:

String sentence = "Hello how in the Hello world are you ? are you okay? Hello";
String[] words = Pattern.compile("\\W+").split(sentence); // split the sentence into words

Map<String, Integer> list = Arrays.stream(words)
        .collect(Collectors.groupingBy(x -> x))
        .entrySet().stream()
        .filter(x -> x.getValue().size() != 1) // remove the words that are not repeated 
        .collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue().size()));

Upvotes: 2

Michał Turczyn
Michał Turczyn

Reputation: 37347

Try this pattern: (?<=\b| )([^ ]+)(?= |$).+(\1) it detects first word, that occurs more than once in a string.

Demo

Upvotes: 0

Related Questions