Justine Jose
Justine Jose

Reputation: 140

Regex A string only contains allowed characters and limit the occurances of characters

This is my character occurrence limit.

Dictionary<string,int> chracterLimit=new  Dictionary<string,int>{{"c",1,"a",2}};

This is my input string...

var mystring="caac";

Here I check if the occurrence of the character is valid by LINQ and if it is used more than the allowed limit.

bool checkstringvalid=!mystring
  .ToCharArray()
  .Select(c => c.ToString())
  .GroupBy(g => g)
  .ToList()
  .ToDictionary(
     d => d.FirstOrDefault(), 
     d => d.Count())
  .Any(z => z.Value > chracterGroup[z.Key]);

the output of above condition is > it is an invalid string. Because the occurrence of c is 2 but allowed limit is 1 only.

When I use this function it is taking more time for bulk data... And my question is how can I check this more easily?

Can u give me a solution to check it by regular expression? My imagine like /a{0,2}/ /c{0,1}/

Thanks in advance!:)

Upvotes: 0

Views: 293

Answers (4)

wp78de
wp78de

Reputation: 18970

I don't know why you are after a regex solution here. Definitively, I will not be faster. Arguably, it's even more complicated and involved if you go beyond your simple example.

For demonstration purposes only, here is your original condition converted to a regular expression:

  • up to one c is allowed
  • up to two a's are allowed
^(?![^c\n]*c[^c\n]*c)(?![^a\n]*a[^a\n]*a[^a\n]*a).*$

Demo

The idea here is to assert a pattern that violets the rules above: two c's or three a's using a negative lookahead with negated character classes as modified .. There are other ways to do it. You should be already convinced not use regex for this task.

Upvotes: 1

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186748

When worknig with symbols, let's work with characters, not strings (we don't want excesive ToString(), don't we?):

   Dictionary<char, int> chracterLimit = new  Dictionary<char,int>{
     {'c', 1},
     {'a', 2}
   };

Then let's detect counter examples early, i.e. if we have "aaaaaaaaa....aaa" we have to read just first 3 a, not the entire string:

   Dictionary<char, int> actual = new Dictionary<char, int>();

   bool checkStringValid = true;

   foreach (char c in mystring) {
     int count = 0;

     if (actual.TryGetValue(c, out count))
       actual[c] = ++count;  
     else
       actual.Add(c, ++count);

     if (chracterLimit.TryGetValue(c, out var limit)) {
       if (count > limit) {
         checkStringValid = false; // limit exceeded

         break;   
       } 
     }
     else {
       checkStringValid = false;  // invalid charcater detected

       break;   
     } 
   }  

The code above is an optimization for speed; if you are looking for more readable solution only:

  bool checkstringvalid = !mystring
    .GroupBy(c => c)
    .Any(chunk => chracterLimit.TryGetValue(chunk.Key, out var limit)
       ? chunk.Skip(limit).Any()
       : true);

Upvotes: 1

steliosbl
steliosbl

Reputation: 8921

The LINQ engine is quite smart, so you're unlikely to get much of a performance boost from what you currently have. One thing you could do is cut out unnecessary operations. A cleaner version of what you have would be:

int s;
bool violation = myString.GroupBy(c => c.ToString())
                         .Any(g => characterLimit.TryGetValue(g.Key, out s) && s < g.Count());

This eliminates the conversions from string, to character array, to list, to dictionary.

For anything quicker than this, you'd need to ditch LINQ and go with an iterative approach.

Upvotes: 1

O. Jones
O. Jones

Reputation: 108706

Your LINQ expression has a lot of conversion in it.

How about this kind of thing instead?

 bool IsStringCompliant (string str, Dictionary<char><int> limits) 
 {
     var lim = new Dictionary<char><int>(limits);  // copy dict, allows re-use
     foreach (var c in str) {
       if (lim.ContainsKey(c)) {
           lim[c] -= 1;
           if (lim[c] <= 0) return false;
       }
       else return <<whatever result you want when a char is not in dict>>
    }
    return true;
 }

Then you do this to use that function.

   var characterLimit = new  Dictionary<string,int>{{'c',1,'a',2}};
   var mystring="caac";
   bool checkstringvalid = IsStringCompliant(mystring, characterLimit);

This will be fast for a few reasons.

  1. it uses char rather than string variables of length 1 where possible.
  2. it plays to the C# compiler's loop optimization technology.
  3. it stops searching as soon as it knows a string has failed validity.

Plus it's easier to understand for the next programmer.

Upvotes: 1

Related Questions