DarkSuniuM
DarkSuniuM

Reputation: 2582

strcat adds junk to the string

I'm trying to reverse a sentence, without changing the order of words,

For example: "Hello World" => "olleH dlroW"

Here is my code:

#include <stdio.h>
#include <string.h>

char * reverseWords(const char *text);
char * reverseWord(char *word);

int main () {
  char *text = "Hello World";
  char *result = reverseWords(text);
  char *expected_result = "olleH dlroW";
  printf("%s == %s\n", result, expected_result);
  printf("%d\n", strcmp(result, expected_result));
  return 0;
}

char *
reverseWords (const char *text) {
  // This function takes a string and reverses it words.
  int i, j;
  size_t len = strlen(text);
  size_t text_size = len * sizeof(char);
  // output containst the output or the result
  char *output;

  // temp_word is a temporary variable,
  // it contains each word and it will be
  // empty after each space.
  char *temp_word;

  // temp_char is a temporary variable,
  // it contains the current character
  // within the for loop below.
  char temp_char;

  // allocating memory for output.
  output = (char *) malloc (text_size + 1);

  for(i = 0; i < len; i++) {

    // if the text[i] is space, just append it
    if (text[i] == ' ') {
      output[i] = ' ';
    }

    // if the text[i] is NULL, just get out of the loop
    if (text[i] == '\0') {
      break;
    }

    // allocate memory for the temp_word
    temp_word = (char *) malloc (text_size + 1);

    // set j to 0, so we can iterate only on the word
    j = 0;

    // while text[i + j] is not space or NULL, continue the loop
    while((text[i + j] != ' ') && (text[i + j] != '\0')) {

      // assign and cast test[i+j] to temp_char as a character,
      // (it reads it as string by default)
      temp_char = (char) text[i+j];

      // concat temp_char to the temp_word
      strcat(temp_word, &temp_char); // <= PROBLEM

      // add one to j
      j++;
    }

    // after the loop, concat the reversed version
    // of the word to the output
    strcat(output, reverseWord(temp_word));

    // if text[i+j] is space, concat space to the output
    if (text[i+j] == ' ')
      strcat(output, " ");

    // free the memory allocated for the temp_word
    free(temp_word);

    // add j to i, so u can skip 
    // the character that already read.
    i += j;
  }

  return output;
}

char *
reverseWord (char *word) {
  int i, j;
  size_t len = strlen(word);
  char *output;

  output = (char *) malloc (len + 1);

  j = 0;
  for(i = (len - 1); i >= 0; i--) {
    output[j++] = word[i];
  }

  return output;
}

The problem is the line I marked with <= PROBLEM, On the first word which in this case is "Hello", it does everything just fine.

On the second word which in this case is "World", It adds junky characters to the temp_word, I checked it with gdb, temp_char doesn't contain the junk, but when strcat runs, the latest character appended to the temp_word would be something like W\006,

It appends \006 to all of the characters within the second word,

The output that I see on the terminal is fine, but printing out strcmp and comparting the result with expected_result returns -94.

Upvotes: 3

Views: 819

Answers (3)

Ikarus
Ikarus

Reputation: 1193

The root cause of junk characters is you use wrong input for the 2nd argument of strcat function. see explain below:

At the beginning of your function you declare:

  int i, j;
  size_t len = strlen(text);
  size_t text_size = len * sizeof(char);
  // output containst the output or the result
  char *output;

  // temp_word is a temporary variable,
  // it contains each word and it will be
  // empty after each space.
  char *temp_word;

  // temp_char is a temporary variable,
  // it contains the current character
  // within the for loop below.
  char temp_char;

you can print variable's addresses in stack, they will be something like this:

printf("&temp_char=%p,&temp_word=%p,&output=%p,&text_size=%p\n", &temp_char, &temp_word,&output,&text_size);
result:    
&temp_char=0x7ffeea172a9f,&temp_word=0x7ffeea172aa0,&output=0x7ffeea172aa8,&text_size=0x7ffeea172ab0

As you can see, &temp_char(0x7ffeea172a9f) is at the bottom of the stack, next 1 byte is &temp_word(0x7ffeea172aa0), next 8 bytes is &output(0x7ffeea172aa8), and so on(I used 64bit OS, so it takes 8 bytes for a pointer)

 // concat temp_char to the temp_word
  strcat(temp_word, &temp_char); // <= PROBLEM

refer strcat description here: http://www.cplusplus.com/reference/cstring/strcat/

the strcat second argument = &temp_char = 0x7ffeea172a9f. strcat considers that &temp_char(0x7ffeea172a9f) is the starting point of the source string, instead of adding only one char as you expect it will append to temp_word all characters starting from &temp_char(0x7ffeea172a9f) , until it meets terminating null character

Upvotes: 2

Vlad from Moscow
Vlad from Moscow

Reputation: 310910

The function strcat deals with strings.

In this code snippet

  // assign and cast test[i+j] to temp_char as a character,
  // (it reads it as string by default)
  temp_char = (char) text[i+j];

  // concat temp_char to the temp_word
  strcat(temp_word, &temp_char); // <= PROBLEM

neither the pointer temp_word nor the pointer &temp_char points to a string.

Moreover array output is not appended with the terminating-zero character for example when the source string consists from blanks.

In any case your approach is too complicated and has many redundant code as for example the condition in the for loop and the condition in the if statement that duplicate each other.

  for(i = 0; i < len; i++) {

    //…

    // if the text[i] is NULL, just get out of the loop
    if (text[i] == '\0') {
      break;
    }

The function can be written simpler as it is shown in the demonstrative program below.

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>
#include <string.h>

char * reverse_words( const char *s )
{
    char *result = malloc( strlen( s ) + 1 );

    if ( result != NULL )
    {
        char *p = result;

        while ( *s != '\0' )
        {
            while ( isblank( ( unsigned char )*s ) )
            {
                *p++ = *s++;
            }


            const char *q = s;

            while ( !isblank( ( unsigned char )*q ) && *q != '\0' ) ++q;

            for ( const char *tmp = q; tmp != s; )
            {
                *p++ = *--tmp;
            }

            s = q;
        }

        *p = '\0';
    }

    return result;
}

int main(void) 
{
    const char *s = "Hello World";

    char *result = reverse_words( s );

    puts( s );
    puts( result );

    free( result );

    return 0;
}

The program output is

Hello World
olleH dlroW

Upvotes: 1

alk
alk

Reputation: 70883

strcat() expects addresses of the 1st character of "C"-strings, which in fact are char-arrays with at least one element being equal to '\0'.

Neither the memory temp_word points to nor the memory &temp_char points to meet such requirements.

Due to this the infamous undefined behaviour is invoked, anything can happen from then on.

A possible fix would be to change

      temp_word = (char *) malloc (text_size + 1);

to become

      temp_word = malloc (text_size + 1); /* Not the issue but the cast is 
                                             just useless in C. */
      temp_word[0] = '\0';

and this

        strcat(temp_word, &temp_char);

to become

        strcat(temp_word, (char[2]){temp_char});

There might be other issues with the rest of the code.

Upvotes: 2

Related Questions