Reputation: 135
I was trying to write a regular expressions to find repeated words in a sentence. Well, I tried using this expression:
\b(\w+)\b.*?\1
to select 3x 'hello', 2x 'are', and 2x 'you' from the sentence "Hello how in the Hello world are you ? are you okay? Hello"
, which I know is clearly wrong, since it takes entire group of words into consideration instead of one particular word!
So could you correct my expression or come up with your own solution?
I'm using Matcher
class to try to find out number of occurrence of a given word using count variable in while loop of matcher.find()
.
Upvotes: 1
Views: 2317
Reputation: 271030
Regex isn't really suitable for a job like this. Regex don't tend to count things. You can do this with the help of regex, but it's very difficult, if not impossible, to do this with only regex.
Here's my attempt:
String sentence = "Hello how in the Hello world are you ? are you okay? Hello";
String[] words = Pattern.compile("\\W+").split(sentence); // split the sentence into words
Map<String, Integer> list = Arrays.stream(words)
.collect(Collectors.groupingBy(x -> x))
.entrySet().stream()
.filter(x -> x.getValue().size() != 1) // remove the words that are not repeated
.collect(Collectors.toMap(x -> x.getKey(), x -> x.getValue().size()));
Upvotes: 2
Reputation: 37347
Try this pattern: (?<=\b| )([^ ]+)(?= |$).+(\1)
it detects first word, that occurs more than once in a string.
Upvotes: 0