Bhargav Rao
Bhargav Rao

Reputation: 52151

Finding periodic strings using string functions

I'm looking for a way to check if a string is periodic or not using JavaScript.

Sample string to match can be 11223331122333. Whereas, 10101 should not match.

Coming from python, I used the RegEx

/(.+?)\1+$/

But it is quite slow. Are there any string methods that can do the trick?

Upvotes: 25

Views: 2652

Answers (5)

Walter Tross
Walter Tross

Reputation: 12644

There is an answer which deserves mentioning for its sheer beauty. It's not mine, I have only adapted it from the Python version, which is here: How can I tell if a string repeats itself in Python?

function is_periodic(s)
{
   return (s + s.substring(0, s.length >> 1)).indexOf(s, 1) > 0;
}

Unfortunately, the speed is not on par with the beauty (and also the beauty has suffered a bit in the adaptation from Python, since indexOf() has a start parameter, but not a stop parameter). A comparison with the regex solution(s) and with the functions of my other answer is here. Even with strings of random length in [4, 400] based on a small 4 character alphabet the functions of my other answer perform better. This solution is faster than the regex solution(s), though.

This solution might be called the “phaseshift solution”. The string is treated like a wave which is identical to itself when shifting its phase.

The advantage of this solution over the ones of my other answer is that it can be easily adapted to return the shortest repeating substring, like this:

function repeating_substr(s)
{
    period = (s + s.substring(0, s.length >> 1)).indexOf(s, 1);
    return period > 0 ? s.substr(0, period) : null;
}

Upvotes: 4

Walter Tross
Walter Tross

Reputation: 12644

The idea of the code below is to consider substrings of all lengths the original string can be divided into evenly, and to check whether they repeat across the original string. A simple method is to check all divisors of the length from 1 to the square root of the length. They are divisors if the division yields an integer, which is also a complementary divisor. E.g., for a string of length 100 the divisors are 1, 2, 4, 5, 10, and the complementary divisors are 100 (not useful as substring length because the substring would appear only once), 50, 25, 20 (and 10, which we already found).

function substr_repeats(str, sublen, subcount)
{
   for (var c = 0; c < sublen; c++) {
      var chr = str.charAt(c);
      for (var s = 1; s < subcount; s++) {
         if (chr != str.charAt(sublen * s + c)) {
            return false;
         }
      }
   }
   return true;
}

function is_periodic(str)
{
   var len = str.length;
   if (len < 2) {
      return false;
   }
   if (substr_repeats(str, 1, len)) {
      return true;
   }
   var sqrt_len = Math.sqrt(len);
   for (var n = 2; n <= sqrt_len; n++) { // n: candidate divisor
      var m = len / n; // m: candidate complementary divisor
      if (Math.floor(m) == m) {
         if (substr_repeats(str, m, n) || n != m && substr_repeats(str, n, m)) {
            return true;
         }
      }
   }
   return false;
}

Unfortunately there is no String method for comparing to a substring of another string in place (e.g., in C language that would be strncmp(str1, str2 + offset, length)).


Say your string has a length of 120, and consists of a substring of length 6 repeated 20 times. You can look at it also as consisting of a sublength (length of substring) 12 repeated 10 times, sublength 24 repeated 5 times, sublength 30 repeated 4 times, or sublength 60 repeated 2 times (the sublengths are given by the prime factors of 20 (2*2*5) applied in different combinations to 6). Now, if you check whether your string contains a sublength of 60 repeated 2 times, and the check fails, you can also be sure that it won't contain any sublength which is a divisor (i.e., a combination of prime factors) of 60, including 6. In other words, many checks made by the above code are redundant. E.g., in the case of length 120, the above code checks (luckily failing quickly most of the time) the following sublengths: 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60 (in this order: 1, 60, 2, 40, 3, 30, 4, 24, 5, 20, 6, 15, 8, 12, 10). Of these, only the following are necessary: 24, 40, 60. These are 2*2*2*3, 2*2*2*5, 2*2*3*5, i.e., the combinations of primes of 120 (2*2*2*3*5) with one of each (nonrepeating) prime taken out, or, if you prefer, 120/5, 120/3, 120/2. So, forgetting for a moment that efficient prime factorization is not a simple task, we can restrict our checks of repeating substrings to p substrings of sublength length/p, where p is a prime factor of length. The following is the simplest nontrivial implementation:

function substr_repeats(str, sublen, subcount) { see above }

function distinct_primes(n)
{
   var primes = n % 2 ? [] : [2];
   while (n % 2 == 0) {
      n /= 2;
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         primes.push(p);
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      primes.push(n);
   }
   return primes;
}

function is_periodic(str)
{
   var len = str.length;
   var primes = distinct_primes(len);
   for (var i = primes.length - 1; i >= 0; i--) {
      var sublen = len / primes[i];
      if (substr_repeats(str, sublen, len / sublen)) {
         return true;
      }
   }
   return false;
}

Trying out this code on my Linux PC I had a surprise: on Firefox it was much faster than the first version, but on Chromium it was slower, becoming faster only for strings with lengths in the thousands. At last I found out that the problem was related to the array that distinct_primes() creates and passes to is_periodic(). The solution was to get rid of the array by merging these two functions. The code is below and the test results are on http://jsperf.com/periodic-strings-1/5

function substr_repeats(str, sublen, subcount) { see at top }

function is_periodic(str)
{
   var len = str.length;
   var n = len;
   if (n % 2 == 0) {
      n /= 2;
      if (substr_repeats(str, n, 2)) {
         return true;
      }
      while (n % 2 == 0) {
         n /= 2;
      }
   }
   for (var p = 3; p * p <= n; p += 2) {
      if (n % p == 0) {
         if (substr_repeats(str, len / p, p)) {
            return true;
         }
         n /= p;
         while (n % p == 0) {
            n /= p;
         }
      }
   }
   if (n > 1) {
      if (substr_repeats(str, len / n, n)) {
         return true;
      }
   }
   return false;
}

Please remember that the timings collected by jsperf.org are absolute, and that different experimenters with different machines will contribute to different combinations of channels. You need to edit a new private version of the experiment if you want to reliably compare two JavaScript engines.

Upvotes: 26

jpsecher
jpsecher

Reputation: 4879

A direct approach is to divide the string into equal-sized chunks and test whether each chuck is the same as the first chunk. Here is an algorithm does that by increasing the chunk size from 1 to length/2, skipping chunk sizes that do not cleanly divide the length.

function StringUnderTest (str) {
    this.str = str;
    this.halfLength = str.length / 2;
    this.period = 0;
    this.divideIntoLargerChunksUntilPeriodicityDecided = function () {
        this.period += 1;
        if (this.period > this.halfLength)
            return false;
        if (this.str.length % this.period === 0)
            if (this.currentPeriodOk())
                return true;
        return this.divideIntoLargerChunksUntilPeriodicityDecided();
    };
    this.currentPeriodOk = function () {
        var patternIx;
        var chunkIx;
        for (chunkIx=this.period; chunkIx<this.str.length; chunkIx+=this.period)
            for (patternIx=0; patternIx<this.period; ++patternIx)
                if (this.str.charAt(patternIx) != this.str.charAt(chunkIx+patternIx))
                    return false;
        return true;
    };
}

function isPeriodic (str) {
    var s = new StringUnderTest(str);
    return s.divideIntoLargerChunksUntilPeriodicityDecided();
}

I have not tested the speed, though...

Upvotes: 4

Sitaru Andrei
Sitaru Andrei

Reputation: 51

If a string is periodic:

  • The last element would be the last element of the period as well
  • The period length would divide the string length

So we can make a super greedy algorithm that takes the last element and checks for occurrences up until half of the length. When we find one, we check if the substring's length divides the main length and only after that we would test against the string.

function periodic(str){
    for(var i=0; i<=str.length/2; i++){
        if(str[i] === str[str.length-1] && str.length%(i+1) === 0){
            if (str.substr(0,i+1).repeat(str.length/(i+1)) === str){
        return true;
            }
        }
    }
    return false;
}

Upvotes: 5

James Thorpe
James Thorpe

Reputation: 32212

One option is to continue using a regex, but to make it greedy by dropping the ?:

/^(.+)\1+$/

Depending on the exact input strings, it may reduce the amount of backtracking required and speed up the matching.

Upvotes: 12

Related Questions