Reputation: 1899
What is the regular expression to find words that are repeated on the same line?
I've tried some expressions that I found on Stack Overflow, such as this, but none is working correctly.
The result I want to achieve:
Upvotes: 4
Views: 15087
Reputation: 7767
This regex will do to find which words you want to highlight. (The example is in JavaScript, and it is easy to test in the browser's JavaScript console.)
s = "It's a foo and a bar and a bar and a foo too.";
a = s.match(/\b(\w+)\b(?=.*\b\1\b)/g);
This returns an array of words, possibly multiple times for the same word.
Next you can do this:
re = new RegExp('\\b(' + a.join('|') + ')\\b', 'g');
And that should suffice to highlight all occurrences:
out = s.replace(re, function(m) { return '<b>' + m + '</b>' });
Upvotes: 20
Reputation: 5836
If you want to find multiple words right after each other, for example,
Sam
went went
to to to
his business
you can use this regex:
s = "Sam went went to to to his business";
a = s.match(/\b(\w+)(\s\1)+\b/g);
Upvotes: 1
Reputation: 1053
You can use this regex to find consecutive words, next to each other.
For example: "My name is Prince Prince, and I love cats."
The regex below will find Prince Prince
. It is the simplest version.
(\w+)(\s\1)+
Upvotes: -1
Reputation: 14334
In the absence of a sample string, lets use a test case and a few examples of how to can achieve this.
String
My name is James and James is my name
Regex
^(James)$
Group 1 (0 is generally the full match string and will likely not have a capture count) is captured twice. This means that the word is repeated. Some logic is required in the the tool you are using to execute your regex in order to decide how if you are interested in the 'word'.
Using the same string, consider this regex
(?<=James.*)(James)
This will detect the word James ONLY if it is proceeded by 'James' followed by any character. Depending on your engine, the '.' (period) should match any character that is not a newline by default. This confines the search to a single line.
Note the limitation of having to specify the word exactly. I am not sure how to get around this.
EDIT Try this, it's a doozy..
(?<=^|\s+\1\s+.*)\s+(\w+)
Using positive lookbehind (as in example 2) we detect 'whole words' that match our current group. A whole word is defined as:
Further, the match we are on must be a standalone word (preceeded by at least one space character).
As far as results are concerned, each match will be a repeated word.
Upvotes: 0