clt60
clt60

Reputation: 63902

Find words with repeating characters

Want search every word in a dictionary what has the same character exactly at the second and last positon, and one times somewhere middle.

examples:

statement - has the "t" at the second, fourth and last place
severe = has "e" at 2,4,last
abbxb = "b" at 2,3,last

wrong

abab = "b" only 2 times not 3
abxxxbyyybzzzzb - "b" 4 times, not 3

my grep is not working

my @ok = grep { /^(.)(.)[^\2]+(\2)[^\2]+(\2)$/ } @wordlist;

e.g. the

perl -nle 'print if /^(.)(.)[^\2]+(\2)[^\2]+(\2)$/' < /usr/share/dict/words

prints for example the

zarabanda

what is wrong.

What should be the correct regex?

EDIT:

And how to i can capture the enclosed groups? e.g. for the

statement - want cantupre: st(a)t(emen)t - for the later use

my $w1 = $1; my w2 = $2; or something like...

Upvotes: 7

Views: 1529

Answers (4)

David
David

Reputation: 147

my @ok = grep {/^.(\w)/; /^.$1[^$1]*?$1[^$1]*$1$/ } @wordlist;

Upvotes: 1

perreal
perreal

Reputation: 97948

Using lookahead:

/^.(.)(?!(?:.*\1){3}).*\1(.*)\1$/

Meaning:

/^.(.)(?!(?:.*\1){3})  # capture the second character if it is not
                       # repeated more than twice after the 2nd position
.*\1(.*)\1$              # match captured char 2 times the last one at the end

Upvotes: 1

ikegami
ikegami

Reputation: 385764

(?:(?!STRING).)* is STRING as [^CHAR]* is to CHAR, so what you want is:

^.             # Ignore first char
(.)            # Capture second char
(?:(?!\1).)*   # Any number of chars that aren't the second char
\1             # Second char
(?:(?!\1).)*   # Any number of chars that aren't the second char
\1\z           # Second char at the end of the string.

So you get:

perl -ne'print if /^. (.) (?:(?!\1).)* \1 (?:(?!\1).)* \1$/x' \
   /usr/share/dict/words

To capture what's in between, add parens around both (?:(?!\1).)*.

perl -nle'print "$2:$3" if /^. (.) ((?:(?!\1).)*) \1 ((?:(?!\1).)*) \1\z/x' \
   /usr/share/dict/words

Upvotes: 12

anubhava
anubhava

Reputation: 785128

This is the regex that should work for you:

^.(.)(?=(?:.*?\1){2})(?!(?:.*?\1){3}).*?\1$

Live Demo: http://www.rubular.com/r/bEMgutE7t5

Upvotes: 5

Related Questions