Nicola Mori
Nicola Mori

Reputation: 889

Copy a struct with a string member in C

I have a simple struct containing a string defined as a char array. I thought that copying an instance of the struct to another instance using the assignment operator would simply copy the memory address stored in the char pointer. Instead it seems that the string content is copied. I put together a very simple example:

#include <stdio.h>
#include <string.h>

struct Test{
  char str[20];
};

int main(){

  struct Test t1, t2;
  strcpy(t1.str, "Hello");
  strcpy(t2.str, "world");
  printf("t1: %s %p\n", t1.str, (char*)(t1.str));
  printf("t2: %s %p\n", t2.str, (char*)(t2.str));
  t2 = t1;
  printf("t2: %s %p\n", t2.str, (char*)(t2.str));
  return 0;
}

Compiling this code with gcc 4.9.2 I get:

t1: Hello 0x7fffb8fc9df0
t2: world 0x7fffb8fc9dd0
t2: Hello 0x7fffb8fc9dd0

As I understand, after t2 = t1 t2.str points to the same memory address it pointed before the assignment, but now inside that address there is the same string found inside t1.str. So it seems to me that the string content has been automatically copied from one memory location to another, something that I thought C would not do. I think that this behaviour is triggered by the fact that I declared str as a char[], not as a char*. Indeed, trying to assign directly one string to another with t2.str = t1.str gives this error:

Test.c: In function ‘main’:
Test.c:17:10: error: assignment to expression with array type
   t2.str = t1.str;
      ^

which makes me think that arrays are effectively treated differently than pointers in some cases. Still I can't figure out which are the rules for array assignment, or in other words why arrays inside a struct are copied when I copy one struct into another one but I can't directly copy one array into another one.

Upvotes: 4

Views: 8276

Answers (4)

Richard Chambers
Richard Chambers

Reputation: 17593

In C a struct is a way for the compiler to know how to structure an area of memory. A struct is a kind of template or stencil which the C compiler uses to figure out how to calculate offsets to the various members of the struct.

The first C compilers did not allow struct assignment so people had to use a memcpy() function to assign structs however later compilers did. A C compiler will do a struct assignment by copying the number of bytes of the struct area of memory, including padding bytes that may be added for address alighnment from one address to another. Whatever happens to be in the source memory area is copied to the destination area. There is nothing smart done about the copy. It is just copy so many bytes of data from one memory location to another.

If you have a string array in the struct or any kind of an array then the entire array will be copied since that is part of the struct.

If the struct contains pointer variables then those pointer variables will also be copied from one area to another. The result of this is that you will have two structs with the same data. The pointer variables in each of those structs will have similar address values, the two areas being a copy of each other, so a particular pointer in one struct will have the same address as the corresponding pointer in the other struct and both will be pointing to the same location.

Remember that a struct assignment is just copying bytes of data from one area of memory to another. For instance if we have a simple struct with a char array with the C source looking like:

typedef struct {
    char tt[50];
} tt_struct;

void test (tt_struct *p)
{
    tt_struct jj = *p;

    tt_struct kk;

    kk = jj;
}

The assembler listing output by the Visual Studio 2005 C++ compiler in debug mode for the assignment of kk = jj; looks like:

; 10   :    tt_struct kk;
; 11   : 
; 12   :    kk = jj;

  00037 b9 0c 00 00 00   mov     ecx, 12            ; 0000000cH
  0003c 8d 75 c4     lea     esi, DWORD PTR _jj$[ebp]
  0003f 8d 7d 88     lea     edi, DWORD PTR _kk$[ebp]
  00042 f3 a5        rep movsd
  00044 66 a5        movsw

This bit of code is copying data 4 byte word by 4 byte word from one location in memory to another. With a smaller char array size, the compiler may opt to use a different series of instructions to copy the memory as being more efficient.

In C arrays are not really handled in a smart way. An array is not seen as a data structure in the same way that Java sees an array. In Java an array is a type of object composed of an array of objects. In C an array is just a memory area and the array name is actually treated like a constant pointer or a pointer that can not be changed. The result is that in C you can have an array say int myInts[5]; which Java would see as an array of five ints however to C that is really a constant pointer with a label of myInts. In Java if you try to access an array element out of range, say myInts[i] where i is a value of 8, you will get a runtime error. In C if you try to access an array element out of range, say myInts[i] where i is a value of 8, you will not get a runtime error unless you are working with a debug build with a nice C compiler that is doing runtime checks. However experienced C programmers have a tendency to treat arrays and pointers as similar constructs though arrays as pointers do have some restrictions since they are a form of a constant pointer and aren't exactly pointers but have some characteristics similar to pointers.

This kind of buffer overflow error is very easy to do in C by accessing an array past its number of elements. The classic example is doing a string copy of a char array into another char array and the source char array does not have a zero termination character in it resulting in a string copy of a few hundred bytes when you expect ten or fifteen.

Upvotes: 0

Vlad from Moscow
Vlad from Moscow

Reputation: 311038

If you run the following simple program

#include <stdio.h>

int main( void )
{
    {
        struct Test
        {
            char str[20];
        };
        printf( "%zu\n", sizeof( Test ) );
    }

    {
        struct Test
        {
            char *str;
        };
        printf( "%zu\n", sizeof( Test ) );
    }
    return 0;
}

you will get a result similar to the following

20
4

So the first structure contains a character array of 20 elements while the second structure contains only a pointer of type char *.

When one structure is assigned to another structure its data members are copied. So for the first structure all content of the array is copied in another structure. For the second structure only the value of the pointer (the address it contains) is copied. The memory pointed to by the pointer is not copied because it is not contained in the structure itself.

And arrays are not pointers though usually names of arrays in expressions (with rare exceptions) are converted to pointers to their first elements.

Upvotes: 0

Krab
Krab

Reputation: 6756

There are really 20 characters in your case, it same as if you declare the struct as struct Test {char c1, char c2, ...}

If you want to copy only pointer to the string, you can change the struct declaration as below and manually manage the memory for the string via functions Test_init and Test_delete.

struct Test{
  char* str;
};

void Test_init(struct Test* test, size_t len) {
  test->str = malloc(len);
}

void Test_delete(struct Test* test) {
  free(test->str);
}

Upvotes: 0

Michel Billaud
Michel Billaud

Reputation: 1826

The structure contains no pointer, but 20 chars. After t2 = t1, the 20 chars of t1 are copied into t2.

Upvotes: 12

Related Questions