Reputation: 27
a regular expression that matches any line of input that has the same word repeated two or more times consecutively in a row. Assume there is one space between consecutive words
if($line!~m/(\b(\w+)\b\s){2,}/{print"No match\n";}
{ print "$`"; #print out first part of string
print "<$&>"; #highlight the matching part
print "$'"; #print out the rest
}
This is best i got so far,but there is something wrong correct me if i am wrong
\b
start with a word boundary
(\w+)
followed by one word or more words
\b
end with a word boundary
\s
then a space
{2,}
check if this thing repeat 2 or more times
what's wrong with my expression
Upvotes: 1
Views: 4004
Reputation: 193
I tried CAustin's answer in regexr.com and the results were not what I would expect. Also, no need for all the non-capturing groups.
My regex:
(\b(\w+))( \2)+
Word-boundary, followed by (1 or more word characters)[group 2], followed by one or more of: space, group 2.
This next one replaces the space with \s+
, generalizing the separation between the words to be 1 or more of any kind of white-space:
(\b(\w+))(\s+\2)+
Upvotes: 1
Reputation: 35198
You aren't actually checking to see if it's the SAME word that's repeating. To do that, you need to use a captured backreference:
if ($line =~ m/\b(\w+)(?:\s\1){2,}\b/) {
print "matched '$1'\n";
}
Also, anytime you're testing a regular expression, it's helpful if you create a list of examples to work with. The following demonstrates one way of doing that using the __DATA__
block
use strict;
use warnings;
while (my $line = <DATA>) {
if ($line =~ m/\b(\w+)(?:\s\1){2,}/) {
print "matched '$1'\n";
} else {
print "no match\n";
}
}
__DATA__
foo foo
foo bar foo
foo foo foo
Outputs
no match
no match
matched 'foo'
Upvotes: 0
Reputation: 4614
This should be what you're looking for: (?:\b(\w+)\b) (?:\1(?: |$))+
Also, don't use \s
when you're just looking for spaces as it's possible you'll match a newline or some other whitespace character. Simple spaces aren't delimiters or special characters in regex, so it's fine to just type the space. You can use [ ]
if you want it to be more visually apparent.
Upvotes: 1