random
random

Reputation: 161

Exercise on recursion in C

How would you solve the following problem involving recursion?

Implement a function with prototype char *repeat(char *s, int n) so that it creates and returns a string which consists of n repetitions of the input string s. For example: if the input is "Hello" and 3, the output is "HelloHelloHello". Use only recursive constructs.

My is solution seems to me quite ugly and I am looking for something cleaner. Here is my code:

char *repeat(char *s, int n) {
  if(n==0) {
     char *ris = malloc(1);
     ris[0] = '\0';
     return ris;
  }
  int a = strlen(s);
  char *ris = malloc(n*a+1);
  char *ris_pre = repeat(s,n-1);
  strcpy(ris,ris_pre);
  strcpy(ris+(n-1)*a,s);
  free(ris_pre);
  return ris;
}

Upvotes: 6

Views: 1636

Answers (5)

aps2012
aps2012

Reputation: 1612

A much more tidy and elegant solution (which I've called Basic Solution) is as follows:

Basic Solution

char *internalRepeat(char *s, int n, size_t total)
{
    return (n > 0)
        ? strcat(internalRepeat(s, n - 1, total + strlen(s)), s)
        : strcpy(malloc(total + 1), "");
}

char *repeat(char *s, int n)
{
    return internalRepeat(s, n, 0);
}

This is the beauty of recursion. The key to this solution uses recursion to incrementally build the length of the result. Parameter total does this (not including the NUL-terminator). When the recursion terminates, the result buffer is allocated once (including the NUL-terminator) and then we use the recursion unwinding to append each copy of sto the result. Basic Solution behaves as follows:

  1. Returns a zero-length string for any number of repetitions of an empty string.
  2. Returns a zero-length string for zero or negative iterations of a non-empty string.
  3. Returns a non-zero-length string for a non-zero-positive number of repetitions on a non-empty string.

If you create a program based on the above functions, the following statements:

printf("Repeat \"\" 0 times: [%s]\n", repeat("", 0));
printf("Repeat \"\" 3 times: [%s]\n", repeat("", 3));
printf("Repeat \"abcde\" 0 times: [%s]\n", repeat("abcde", 0));
printf("Repeat \"abcde\" 1 times: [%s]\n", repeat("abcde", 1));
printf("Repeat \"abcde\" 4 times: [%s]\n", repeat("abcde", 4));

will produce the following output:

Repeat "" 0 times: []
Repeat "" 3 times: []
Repeat "abcde" 0 times: []
Repeat "abcde" 1 times: [abcde]
Repeat "abcde" 4 times: [abcdeabcdeabcdeabcde]



EDIT : Optimised Solution follows. Read on if you're interested in optimisation techniques.



All the other proposals here principally run in O(n^2) and allocate memory at every iteration. Even though Basic Solution is elegant, uses only a single malloc(), and takes only two statements, surprisingly Basic Solution also has a running time of O(n^2). This makes it very inefficient if string s is long and means that Basic Solution is no more efficient than any other proposal here.

Optimised Solution

The following is an optimal solution to this problem that actually runs in O(n):

char *internalRepeat(char *s, int n, size_t total, size_t len)
{
    return (n > 0)
        ? strcpy(internalRepeat(s, n - 1, total, len), s) + len
        : strcpy(malloc(total + 1), "");
}

char *repeat(char *s, int n)
{
    int len = strlen(s);

    return internalRepeat(s, n, n * len, len) - (n * len);
}

As you can see, it now has three statements and uses one more parameter, len, to cache the length of s. It recursively uses len to compute the position within the result buffer where the n'th copy of s will be positioned, so allowing us to replace strcat() with strcpy() for each time s is added to the result. This gives an actual running time of O(n), not O(n^2).

What's the difference between the Basic and Optimised solutions?

All other solutions have used strcat() at least n times on string s to append n copies of s to the result. This is where the problem lies, because the implementation of strcat() hides an inefficiency. Internally, strcat() can be thought of as:

strcat = strlen + strcpy

i.e., when appending, you first have to find the end of the string you're appending to before you can do the append itself. This hidden overhead means that, in fact, creating n copies of a string requires n length checks and n physical copying operations. However, the real problem lies in that for each copy of s we append, our result gets longer. This means that each successive length check within strcat() on the result is also getting longer. If we now compare the two solutions using "number of times we have to scan or copy s" as our basis for comparison, we can see where the difference in the two solutions lies.

For n copies of the string s, the Basic Solution performs as follows:

strlen's/iteration: 2
strcpy's/iteration: 1

Iteration | Init | 1 | 2 | 3 | 4 | ... | n |   Total    |
----------+------+---+---+---+---+-----+---+------------+
Scan "s"  |   0  | 1 | 2 | 3 | 4 | ... | n | (n+1)(n/2) |
Copy "s"  |   0  | 1 | 1 | 1 | 1 | ... | 1 |     n      |

whereas the Optimised Solution performs like this:

strlen's/iteration: 0
strcpy's/iteration: 1

Iteration | Init | 1 | 2 | 3 | 4 | ... | n |    Total   |
----------+------+---+---+---+---+-----+---+------------+
Scan "s"  |   1  | 0 | 0 | 0 | 0 | ... | 0 |      1     |
Copy "s"  |   0  | 1 | 1 | 1 | 1 | ... | 1 |      n     |

As you can see from the table, the Basic Solution performs (n^2 + n)/2 scans of our string due to the built-in length check in strcat(), whereas the Optimised Solution always does (n + 1) scans. This is why the Basic Solution (and every other solution that relies on strcat()) performs in O(n^2), whereas the Optimised Solution performs in O(n).

How does O(n) compare to O(n^2) in real terms?

Running times make a huge difference when large strings are being used. As an example, let's take a string s of 1MB that we wish to create 1,000 copies of (== 1GB). If we have a 1GHz CPU that can scan or copy 1 byte/clock cycle, then 1,000 copies of s will be generated as follows:

Note: n is taken from performance tables above, and represents a single scan of s.

Basic:  (n + 1) * (n / 2) + n = (n ^ 2) / 2 + (3n / 2)
                              = (10^3 ^ 2) / 2 + (3 * 10^3) / 2
                              = (5 * 10^5) + (1.5 * 10^2)
                              = ~(5 * 10^5) (scans of "s")
                              = ~(5 * 10^5 * 10^6) (bytes scanned/copied)
                              = ~500 seconds (@1GHz, 8 mins 20 secs).

Optimised: (n + 1)            = 10^3 + 1
                              = ~10^3 (scans of "s")
                              = ~10^3 * 10^6 (bytes scanned/copied)
                              = 1 second (@1Ghz)

As you can see, the Optimised Solution, which completes nearly instantly, demolishes the Basic Solution which takes nearly 10 minutes to complete. However, if you think making string s smaller will help, this next result will horrify you. Again, on a 1GHz machine that processes 1 byte/clock cycle, we take s as 1KB (1 thousand times smaller), and make 1,000,000 copies (total == 1GB, same as before). This gives:

Basic:  (n + 1) * (n / 2) + n = (n ^ 2) / 2 + (3n / 2)
                              = (10^6 ^ 2) / 2 + (3 * 10^6) / 2
                              = (5 * 10^11) + (1.5 * 10^5)
                              = ~(5 * 10^11) (scans of "s")
                              = ~(5 * 10^11 * 10^3) (bytes scanned/copied)
                              = ~50,000 seconds (@1GHz, 833 mins)
                              = 13hrs, 53mins, 20 secs

Optimised: (n + 1)            = 10^6 + 1
                              = ~10^6 (scans of "s")
                              = ~10^6 * 10^3 (bytes scanned/copied)
                              = 1 second (@1Ghz)

This is a truly shocking difference. Optimised Solution performs in the same time as before as the total amount of data written is the same. However, Basic Solution stalls for over half a day building the result. This is the difference in running times between O(n) and O(n^2).

Upvotes: 9

phoxis
phoxis

Reputation: 61910

This is one:

char *repeat (char *str, int n)
{
  char *ret_str, *new_str;

  if (n == 0)
  {
    ret_str = strdup ("");
    return ret_str;
  }
  ret_str = repeat (str, n-1);
  new_str = malloc (sizeof (char) * strlen (str) * (n + 1));
  new_str[0] = '\0';
  strcpy (new_str, ret_str);
  strcat (new_str, str);
  free (ret_str);
  return new_str;
}

We can get someone neater looking code with realloc ()

char *repeat (char *str, int n)
{
  char *ret_str;

  if (n == 0)
  {
    ret_str = strdup ("");
    return ret_str;
  }
  ret_str = repeat (str, n-1);
  ret_str = realloc (ret_str, sizeof (char) * strlen (str) * (n + 1));
  strcat (ret_str, str);
  return ret_str;
}

EDIT 1

Ok, this one is more compact.

char *repeat (char *str, int n)
{
  static char *ret_str;
  static int n_top = -1;

  if (n >= n_top)
    ret_str = calloc (sizeof (char), strlen (str) * n + 1);
  if (n <= 0)
    return ret_str;

  n_top = n;

  return strcat (repeat (str, n-1), str);
}

We use a static buffer to hold the final string, therefore one single buffer is used throughout in all the levels of recursion.

The static int n_top holds the value of the previous value of n from recursive calls. This is initialized by -1 to handle the case when called with n = 0, and thus it returns an empty string (and for which calloc is used to initialize with 0). At the first recursive call the value is -1 therefore only at the top level n > n_top is true (as n is always decreasing), and in this case the entire buffer is allocated ret_str. Else we find for the bottom condition, which is when n becomes 0. At this point when n = 0 we return the address of the pre-allocated static buffer ret_str to the parent callers in the recursion tree. This single buffer is then used by each level of recursion appended by str and handed over to the previous level, until it reaches the main.

EDIT 2

Even more compact one, but ugly

char *repeat (char *str, int n)
{
  static int n_top;
  n_top = (n_top == 0)? n: n_top;
  return (n <= 0)?(n=n_top,n_top=0,calloc (sizeof (char), strlen (str) * n + 1)):strcat (repeat (str, n-1), str);
}

The last compact code had a problem if you would use call with repeat (str, n); repeat (str, 0);. This implementation overcomes that problem and also it is even more compact one also using only one function.

Note that there is an ugly (n=n_top,n_top=0,calloc (sizeof (char), strlen (str) * n + 1)). Here we ensure while rolling back up we use the value of n_top to allocate memory and then reset n_top to 0 so that the function has n_top set to 0 in the next call from main () or other main caller (not recursive). This can be done in more readable way, but this looks cool. I would recommend to stick with a more readable one.

EDIT 3

The mad-man version

This overcomes repetitive strlen () calls. strlen () is called only once, and then the value of the string length along with the value of n in the current depth is used to find the offset value which indicates the end of the final string being returned (whose address is not stored in any intermediate variable, just returned and passed). When passing the string to memcpy we add the offset and give the source memory location to memcpy by adding offset to the returned answer string from the immediately next depth. This actually provides memcpy the location immediately after the string end, after which memcpy copies the stuff str of length str_len. Note that memcpy will return the destination address which it was passed, that is the answer string end address of this depth, but we need the actual beginning, which is achieved by going back offset from this returned value, and that is why offset is subtracted before returning.

Note this still uses single function :D

char *repeat (char *str, int n)
{
  static int n_top, str_len;
  int offset = 0;

  (n_top == 0)?(n_top = n,str_len = strlen (str)):(offset = str_len * (n_top-n));
  return (n <= 0)?(n=n_top,n_top=0,malloc (str_len * n + 1)):(memcpy (repeat (str, n-1) + offset, str, str_len) - offset);
}

Some notes:

  • We may have done offset = str_len * (n-1) in which case in the first depth the str would be copied in offset 0, from subsequent recursive depths it would copy the string to the answer string from reverse.

  • When doing memcpy we tell it to copy n bytes, which does not include \0 . But as we use calloc to allocate the final destination memory with the space for the terminating `'\0' character, it is initialized to 0. Therefore the final string will be '\0' terminated.

  • sizeof (char) is always 1

  • To make it look more compact and cryptic remove offset calculation and directly calculate the offset in the last return expression.

  • DO NOT use this code in real life.

Upvotes: 2

slashmais
slashmais

Reputation: 7155

A possible solution:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


char *repeat(char *s, int n)
{
    static char *sret=NULL;
    static int isnew=1;

    if (!s || !s[0])
    {
        if (sret) { free(sret); sret=NULL; }
        return "";
    }

    if (n<=0) return "";

    if (isnew)
    {
        int nbuf = strlen(s)*n + 1;
        sret = (char*)realloc(sret, nbuf);
        memset(sret, 0, nbuf);
        isnew=0;
    }

    strcat(sret,s);
    repeat(s, n-1);
    isnew = 1;
    return sret;
}

int main()
{
    char *s = repeat("Hello",50);
    printf("%s\n", s);

    s = repeat("Bye",50);
    printf("%s\n", s);

    repeat(NULL,0); /* this free's the static buffer in repeat() */

    s = repeat("so long and farewell",50);
    printf("%s\n", s);

    return 0;
}

[edit]
A variation of aps2012's solution that uses a single function, but with a static int:

char *repeat(char *s, int n)
{
    static int t=0;
    return (n > 0) 
        ? (t += strlen(s),strcat(repeat(s, n - 1), s)) 
        : strcpy(malloc(t + 1), "");
}

The caller has to free() the returned string to avoid memory leaks.

Upvotes: 0

Adam Liss
Adam Liss

Reputation: 48290

Here's a solution that requires a bit more code, but it runs in O(log n) time instead of O(n):

// Return a string containing 'n' copies of 's'
char *repeat(int n, char *s) {
  return concat((n-1) * strlen(s), strdup(s));
}

// Append 'charsToAdd' characters from 's' to 's', charsToAdd >= 0
char *concat(int charsToAdd, char *s) {
  int oldLen = strlen(s);
  if (charsToAdd <= n) {  // Copy only part of the original string.
    char *longerString = malloc((oldLen + charsToAdd + 1) * sizeof(char));
    strcpy(longerString, s);
    strncat(longerString, s, charsToAdd);
    return longerString;
  } else { // Duplicate s and recurse.
    char *longerString = malloc((2 * oldLen + 1) * sizeof(char));
    strcpy(longerString, s);
    strcat(longerString, s);
    free(s);  // Free the old string; the recusion will allocate a new one.
    return concat(charsToAdd - oldLen, longerString);
  }
}

Upvotes: 0

giorashc
giorashc

Reputation: 13713

Try this approach where you allocate the string only once :

char *repeat(char *s, int n) {
   int srcLength = strlen(s);
   int destLength = srcLength * n + 1;      
   char *result = malloc(destLength);
   result[0] = '\0'; // This is for strcat calls to work properly

   return repeatInternal(s, result, n);
}

char *repeatInternal(char *s, char *result, int n) {
  if(n==0) {
     return result;
  }

  strcat(s, result);  
  return repeat(result, s, n-1);
}

The second repeat method should be used only by the first one. (the first one is your prototype method)

Note : I did not compile/test it but this should work.

Upvotes: 3

Related Questions