Zapdos772
Zapdos772

Reputation: 69

Word made up of exactly 4 different letters using regular expressions?

For example reassesses will match. It contains exactly 4 different characters: 'r', 'e', 'a' and 's'.

My attempt is: /^([a-z])([a-z])([a-z])([a-z])(\1|\2|\3|\4)(\1|\2|\3|\4)(\1|\2|\3|\4)$/ (add as many (\1|\2|\3|\4) as required to match length of word)

However this will match only up to 4 different letters, and only if they are the first 4 characters.

Is there any better solution?

Upvotes: 6

Views: 589

Answers (4)

aliteralmind
aliteralmind

Reputation: 20163

As far as regex, this is a brain-buster. Here is a non-regex solution. A function that uses a map to keep track of unique characters, and returns true when the maximum number of unique characters is reached.

import  java.util.Map;
import  java.util.TreeMap;

/**
   <P>{@code java ExactlyFourDiffChars}</P>
 **/
public class ExactlyFourDiffChars  {
   public static final void main(String[] ignored)  {
      System.out.println("aabbccdd: " + hasMoreThanXUniqueChars(4, "aabbccdd"));
      System.out.println("dictionary: " + hasMoreThanXUniqueChars(4, "dictionary"));
      System.out.println("reassesses: " + hasMoreThanXUniqueChars(4, "reassesses"));
   }
   public static final boolean hasMoreThanXUniqueChars(int maxAllowedChars, String str)  {
      Map<Character,Object> charMap = new TreeMap<Character,Object>();

      for(int i = 0; i < str.length(); i++)  {
         Character C = str.charAt(i);
         if(!charMap.containsKey(C))  {
            charMap.put(C, null);

            if(maxAllowedChars-- == 0)  {
               return  false;
            }
         }
      }
      return  true;
   }
}

Output:

[C:\java_code\]java ExactlyFourDiffChars
aabbccdd: true
dictionary: false
reassesses : true

Upvotes: 0

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89629

Something like this:

^([a-z])\1*+([a-z])(?:\1|\2)*+([a-z])(?:\1|\2|\3)*+([a-z])(?:\1|\2|\3|\4)*$

The use of possessive quantifiers is essential in this pattern, because it forbids backtracking and avoids that the following capturing group matches a letter that has been found.

The possessive quantifier feature is available in Java (don't forget to double escape backreferences), but if you need to use the pattern in a language that doesn't have this feature, you can find several options to "translate" the pattern in my comment.

The above pattern is build to check a whole string, but if you want to find words in a larger string, you can use this (with eventually the case-insensitive option):

(?<![a-z])([a-z])\1*+([a-z])(?:\1|\2)*+([a-z])(?:\1|\2|\3)*+([a-z])(?:\1|\2|\3|\4)*(?![a-z])

Upvotes: 1

Khaelex
Khaelex

Reputation: 762

Try

^([a-z])\1*([a-z])(\1*\2*)*([a-z])(\1*\2*\4*)*([a-z])(\1*\2*\4*\6*)*$

Edit to not match less than 4 unique (e.g. aaaa):

^([a-z])\1*(?!\1)([a-z])(\1*\2*)*(?!\1)(?!\2)([a-z])(\1*\2*\4*)*(?!\1)(?!\2)(?!\4)([a-z])(\1*\2*\4*\6*)*$

Upvotes: 1

user557597
user557597

Reputation:

Definetly works -
This should cause an alignment that only composes 4 distinct char's of
a string >= 4 in length.

 #  ^(?=.*(.).*(?!\1)(.).*(?!\1|\2)(.).*(?!\1|\2|\3)(.))(?:\1|\2|\3|\4)+$

 ^ 
 (?=
      .* 
      ( . )
      .* 
      (?! \1 )
      ( . )
      .* 
      (?! \1 | \2 )
      ( . )
      .* 
      (?! \1 | \2 | \3 )
      ( . )
 )
 (?: \1 | \2 | \3 | \4 )+
 $ 

Perl test case:

if ("upepipipeu" =~ /^(?=.*(.).*(?!\1)(.).*(?!\1|\2)(.).*(?!\1|\2|\3)(.))(?:\1|\2|\3|\4)+$/)
{
      print "unique chars: '$1'  '$2'  '$3'  '$4'\n";
      print "matched:      '$&'\n";
}

Output >>

unique chars: 'i'  'p'  'e'  'u'
matched:      'upepipipeu'

Test case for @aliteralmind:

@Ary = ("aabbccdd", "dictionary", "reassess", "aaaa");

for( @Ary )
{
    if ("$_" =~ /^(?=.*(.).*(?!\1)(.).*(?!\1|\2)(.).*(?!\1|\2|\3)(.))(?:\1|\2|\3|\4)+$/)
    {
       print "unique chars: '$1'  '$2'  '$3'  '$4'\n";
       print "matched:      '$&'\n\n";
    }
    else
    {
       print "Failed-> '$_'\n\n";
    }
}

Output >>

unique chars: 'a'  'b'  'c'  'd'
matched:      'aabbccdd'

Failed-> 'dictionary'

unique chars: 'r'  'a'  'e'  's'
matched:      'reassess'

Failed-> 'aaaa'

Upvotes: 1

Related Questions