Becker
Becker

Reputation: 301

strlen return wrong answer in c

I am writing a program which convert decimal number to roman number. I use 4 arrays thousands, hundreds, tens, units to store digits in roman number, then copy each digit to res array, I use str pointer to track where the string in res begin. When I test with input 128, it prints out CXXVIIIIX which must be CXXVIII. I have tried to debug and got the result when tmp=8 is strlen(units[tmp-1]) = 6, which means strlen also counts IX. And for some case like 3888, the program prints out trash value.

This is my code

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <stdbool.h>
#include <string.h>
#include <windows.h>

int main(){
    int n=128;

    char thousands[][4]={"M","MM","MMM"};
    char hundreds[][5]={"C","CC","CCC","CD", "D", "DC", "DCC", "DCCC", "CM"};
    char tens[][5]={"X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"};
    char units[][5]={"I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"};

    char res[16];
    res[15]='\0';    //add nul to the last character of res array
    char *str=res+14;   //str tracks where the string start
    int tmp;
    
    
    //store roman digits in res array in reverse order, 
    //start from units->tens->hundreds->thousands
    if (n!=0)
    {
        tmp=n%10;
        if (tmp!=0)
        {
            str-=strlen(units[tmp-1]);  //str steps back several address to store new digits
            strncpy(str, units[tmp-1], strlen(units[tmp-1]));   //use strncpy to copy without '\0'
            
        }
        n/=10;
    }
    
    if (n!=0)
    {
        tmp=n%10;
        if (tmp!=0)
        {
            str-=strlen(tens[tmp-1]);
            strncpy(str, tens[tmp-1], strlen(tens[tmp-1]));
            
        }
        n/=10;
    }
    
    if (n!=0)
    {
        tmp=n%10;
        if (tmp!=0)
        {
            str-=strlen(hundreds[tmp-1]);
            strncpy(str, hundreds[tmp-1], strlen(hundreds[tmp-1]));
            
        }
        n/=10;
    }
    
    if (n!=0)
    {
        tmp=n%10;
        if (tmp!=0)
        {
            str-=strlen(thousands[tmp-1]);
            strncpy(str, thousands[tmp-1], strlen(thousands[tmp-1]));
            
        }
        n/=10;
    }
    
    printf("%s", str);
    return 0;
}

So why does this happen and how to fix it?

Any help would be appreciated.

Upvotes: 0

Views: 234

Answers (2)

Bodo
Bodo

Reputation: 9855

After fixing the array size for all string literals, there is an off-by-one error and an uninitialized char array element.

  • char res[16]; would be enough for "MMMDCCCLXXXVIII" with a trailing '\0'.
  • With res[15]='\0'; you store the string termination in the last element.
  • char *str=res+14; sets the pointer to the uninitialized character before this.
  • str-=strlen(something) moves the pointer according to the length of the string to insert. The following strncpy will not overwrite the uninitialized character.

The result will always contain an uninitialized trailing character which may or may not be visible.

A result of the maximum length ("MMMDCCCLXXXVIII") will begin one character before the first array element because of the trailing uninitialized character. You would have a result string "MMMDCCCLXXXVIII*" where * is uninitialized. The second M is the first array element of res, the first M would be one before the first element.

Example to demonstrate the off-by-one error:

Based on the original (wrong) code from the question.

Note that this is not the full code but a reduced extract intended to show how the variable values change with input value n=3888. (To make the code shorter, the if (tmp!=0) guards from the original code are missing here.)

In the comments that show variable values, * is used to indicate an uninitialized character.

int n=3888;

char res[16];
// * indicates an uninitialized character
res[15]='\0';  // "***************\0"

// The following initialization is wrong.
char *str=res+14; // str = &res[14] (the last uninitialized character)
int tmp;

// The following block leaves the last character of res uninitialized.
tmp=n%10; // tmp = 8
str-=strlen(units[tmp-1]); // "VIII" (strlen = 4) -> str = &res[10]
strncpy(str, units[tmp-1], strlen(units[tmp-1])); // "**********VIII*\0"
n/=10; // n = 388

// Here, str points to the first character written by the previous block.
tmp=n%10; // tmp = 8
str-=strlen(tens[tmp-1]); // "LXXX" (strlen = 4) -> str = &res[6]
strncpy(str, tens[tmp-1], strlen(tens[tmp-1])); // ******LXXXVIII*\0"
n/=10; // n = 38

/* ...*/

// final result would be

// str = &res[-1]
// str -> "MMMDCCCLXXXVIII*\0"
// res =   "MMDCCCLXXXVIII*\0"

// The last strncpy tries to write a character 'M' one 
// element before the beginning of the array res
// corresponding to strncpy(res[-1], "MMM", 3)

As a fix I propose

char res[16] = {0}; // initialize the whole array with 0
char *str=res+(sizeof(res)-1);

Upvotes: 2

chux
chux

Reputation: 153478

... and how to fix it?

  • Consider writing units in 1,000s, then 100s, tens, ones order.

  • Add a "zero" to each unit list.

  • Roll into a helper function.

  • Change types from character arrays to arrays of pointers.

Example

void print_roman(int n) {
  static const char *thousands[] = {"", "M", "MM", "MMM"};
  static const char *hundreds[] = {"", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"};
  static const char *tens[] = {"", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"};
  static const char *ones[] = {"", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"};
  static const char **units[] = {thousands, hundreds, tens, ones};
  int units_n = sizeof units / sizeof units[0];

  assert(n > 0 && n < 4000);
  char res[16];
  char *p = res;
  int multiplier = 1000;
  for (int i = 0; i < units_n; i++) {
    strcpy(p, units[i][n / multiplier]);
    p += strlen(p);
    n %= multiplier;
    multiplier /= 10;
  }
  printf("<%s>\n", res);
}

int main() {
  print_roman(3888);
  print_roman(3999);
  print_roman(1010);
  print_roman(42);
}

Output

<MMMDCCCLXXXVIII>
<MMMCMXCIX>
<MX>
<XLII>

Upvotes: 1

Related Questions