Johndt
Johndt

Reputation: 4327

Creating arrays in C

I am attempting to create a UNIX shell in C. If it were in Java, it would be a piece of cake, but I am not so experienced in C. Arrays in C confuse me a bit. I am not sure how to declare or access certain data structures.

I would like to create a string to read in each line. Easy enough: simply an array of characters. I would initialize it as follows:

char line[256]; //Maximum size of each line is 255 characters

And to access an element of this array, I would do as follows:

line[0] = 'a'; //Sets element 0 to 'a'
fgets( line, sizeof line, stdin ); //Gets a line from stdin and places it in line

How does declaring and using a string in this manner differ from declaring it as a pointer? From my understanding, an array in C decays to a pointer. So, would the following be equivalent?

char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );

When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?

Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?

char *arr[20]; // Declares an array of strings with 20 elements

And how would I access it?

arr[0] = "hello" // Sets element zero of arr to "hello"

Is this correct?

How would I pass this array to a function?

execvp("ls", arr); // Executes ls with argument vector arr

Is that correct, or would I use the pointer *arr? If so, why?

Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?

char **vector_arr[20]; // An array of arrays of strings

And how would I access an element of this array?

execvp("ls", vector_arr[0]); // Executes ls with first element of vector_arr as argument vector

I thought that I grasped a decent understanding of what a pointer is, and even how arrays relate to pointers, however I seem to be having trouble relating this to the actual code. I guess that when dealing with pointers, I don't know when to reference *var, var, or &var.

Upvotes: 6

Views: 4360

Answers (3)

John Bode
John Bode

Reputation: 123468

Let's talk about expressions and types as they relate to arrays in C.

Arrays

When you declare an array like

char line[256];

the expression line has type "256-element array of char"; except when this expression is the operand of the sizeof or unary & operators, it will be converted ("decay") to an expression of type "pointer to char", and the value of the expression will be the address of the first element of the array. Given the above declaration, all of the following are true:

 Expression             Type            Decays to            Equivalent value
 ----------             ----            ---------            ----------------
       line             char [256]      char *               &line[0]
      &line             char (*)[256]   n/a                  &line[0]
      *line             char            n/a                  line[0]
    line[i]             char            n/a                  n/a
   &line[0]             char *          n/a                  n/a
sizeof line             size_t          n/a                  Total number of bytes 
                                                               in array (256)

Note that the expressions line, &line, and &line[0] all yield the same value (the address of the first element of the array is the same as the address of the array itself), it's just that the types are different. In the expression &line, the array expression is the operand of the & operator, so the conversion rule above doesn't apply; instead of a pointer to char, we get a pointer to a 256-element array of char. Type matters; if you write something like the following:

char line[256];
char *linep = line;
char (*linearrp)[256] = &line;

printf( "linep    + 1 = %p\n", (void *) (linep + 1) );
printf( "linearrp + 1 = %p\n", (void *) (linearrp + 1) );

you'd get different output for each line; linep + 1 would give the address of the next char following line, while linearrp + 1 would give the address of the next 256-element array of char following line.

The expression line is not an modifiable lvalue; you cannot assign to it, so something like

char temp[256];
...
line = temp;

would be illegal. No storage is set aside for a variable line separate from line[0] through line[256]; there's nothing to assign to.

Because of this, when you pass an array expression to a function, what the function receives is a pointer value, not an array. In the context of a function parameter declaration, T a[N] and T a[] are interpreted as T *a; all three declare a as a pointer to T. The "array-ness" of the parameter has been lost in the course of the call.

All array accesses are done in terms of pointer arithmetic; the expression a[i] is evaluated as *(a + i). The array expression a is first converted to an expression of pointer type as per the rule above, then we offset i elements from that address and dereference the result.

Unlike Java, C does not set aside storage for a pointer to the array separate from the array elements themselves: all that's set aside is the following:

+---+
|   | line[0]
+---+
|   | line[1]
+---+
 ...
+---+
|   | line[255]
+---+

Nor does C allocate memory for arrays from the heap (for whatever definition of heap). If the array is declared auto (that is, local to a block and without the static keyword), the memory will be allocated from wherever the implementation gets memory for local variables (what most of us call the stack). If the array is declared at file scope or with the static keyword, the memory will be allocated from a different memory segment, and it will be allocated at program start and held until the program terminates.

Also unlike Java, C arrays contain no metadata about their length; C assumes you knew how big the array was when you allocated it, so you can track that information yourself.

Pointers

When you declare a pointer like

char *line;

the expression line has type "pointer to char" (duh). Enough storage is set aside to store the address of a char object. Unless you declare it at file scope or with the static keyword, it won't be initialized and will contain some random bit pattern that may or may not correspond to a valid address. Given the above declaration, all of the following are true:

 Expression             Type            Decays to            Equivalent value
 ----------             ----            ---------            ----------------
       line             char *          n/a                  n/a
      &line             char **         n/a                  n/a
      *line             char            n/a                  line[0]
    line[i]             char            n/a                  n/a
   &line[0]             char *          n/a                  n/a
sizeof line             size_t          n/a                  Total number of bytes
                                                               in a char pointer
                                                               (anywhere from 2 to
                                                               8 depending on the
                                                               platform)

In this case, line and &line do give us different values, as well as different types; line is a simple scalar object, so &line gives us the address of that object. Again, array accesses are done in terms of pointer arithmetic, so line[i] works the same whether line is declared as an array or as a pointer.

So when you write

char *line = malloc( sizeof *line * 256 ); // note no cast, sizeof expression

this is the case that works like Java; you have a separate pointer variable that references storage that's allocated from the heap, like so:

+---+ 
|   | line -------+
+---+             |
 ...              |
+---+             |
|   | line[0] <---+
+---+
|   | line[1]
+---+
 ...
+---+
|   | line[255]
+---+

Unlike Java, C won't automatically reclaim this memory when there are no more references to it. You'll have to explicitly deallocate it when you're finished with it using the free library function:

free( line );

As for your specific questions:

fgets( *line, sizeof(line), stdin );

When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?

It is not correct; fgets expects the first argument to have type "pointer to char"; the expression *line has type char. This follows from the declaration:

char *line; 

Secondly, sizeof(line) only gives you the size of the pointer, not the size of what the pointer points to; unless you want to read exactly sizeof (char *) bytes, you'll have to use a different expression to specify the number of characters to read:

fgets( line, 256, stdin );
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?
char *arr[20]; // Declares an array of strings with 20 elements

C doesn't have a separate "string" datatype the way C++ or Java do; in C, a string is simply a sequence of character values terminated by a 0. They are stored as arrays of char. Note that all you've declared above is a 20-element array of pointers to char; those pointers can point to things that aren't strings.

If all of your strings are going to have the same maximum length, you can declare a 2D array of char like so:

char arr[NUM_STRINGS][MAX_STRING_LENGTH + 1]; // +1 for 0 terminator

and then you would assign each string as

strcpy( arr[i], "some string" );
strcpy( arr[j], some_other_variable );
strncpy( arr[k], MAX_STRING_LENGTH, another_string_variable );

although beware of strncpy; it won't automatically append the 0 terminator to the destination string if the source string was longer than the destination. You'll have to make sure the terminator is present before trying to use it with the rest of the string library.

If you want to allocate space for each string separately, you can declare the array of pointers, then allocate each pointer:

char *arr[NUM_STRINGS];
...
arr[i] = malloc( strlen("some string") + 1 );
strcpy( arr[i], "some string" );
...
arr[j] = strdup( "some string" ); // not available in all implementations, calls
                                  // malloc under the hood
...
arr[k] = "some string";  // arr[k] contains the address of the *string literal*
                         // "some string"; note that you may not modify the contents
                         // of a string literal (the behavior is undefined), so 
                         // arr[k] should not be used as an argument to any function
                         // that tries to modify the input parameter.

Note that each element of arr is a pointer value; whether these pointers point to strings (0-terminated sequences of char) or not is up to you.

Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?
char **vector_arr[20]; // An array of arrays of strings

What you've declared is an array of pointers to pointers to char; note that this is perfectly valid if you don't know how many pointers to char you need to store in each element. However, if you know the maximum number of arguments per element, it may be clearer to write

char *vector_arr[20][N];

Otherwise, you'd have to allocate each array of char * dynamically:

char **vector_arr[20] = { NULL }; // initialize all the pointers to NULL

for ( i = 0; i < 20; i++ )
{
  // the type of the expression vector_arr is 20-element array of char **, so
  // the type of the expression vector_arr[i] is char **, so
  // the type of the expression *vector_arr[i] is char *, so
  // the type of the expression vector[i][j] is char *, so
  // the type of the expression *vector_arr[i][j] is char

  vector_arr[i] = malloc( sizeof *vector_arr[i] * num_args_for_this_element );
  if ( vector_arr[i] )
  {
    for ( j = 0; j < num_args_for_this_element )
    {
      vector_arr[i][j] = malloc( sizeof *vector_arr[i][j] * (size_of_this_element + 1) );
      // assign the argument
      strcpy( vector_arr[i][j], argument_for_this_element );
    }
  }
}

So, each element of vector_arr is an N-element array of pointers to M-element arrays of char.

Upvotes: 7

par
par

Reputation: 17724

You're really on the right track.

In your second example, where you use malloc(), the fgets() command would be called like so:

fgets( line, sizeof(line), stdin ); /* vs. fgets( *line ... ) as you have */

The reason for this is that in C a named array variable is always just a pointer. So:

char line[256];

declares (and defines) a pointer called line that points to 256 bytes of memory allocated at compile time (probably on the stack).

char *line; also declares a pointer, but the memory it points to is not assigned by the compiler. When you call malloc you typecast the return value to char * and assign it to line so the memory is allocated dynamically on the heap.

Functionally though, the variable line is just a char * (pointer to char) and if you look at the declaration of fgets in the <stdio.h> file, you'll see what it expects as its first argument:

char *fgets(char * restrict str, int size, FILE * restrict stream);

... namely a char *. So you could pass line either way you declared it (as a pointer or as an array).

With respect to your other questions:

char *arr[20]; declares 20 uninitialized pointers to char *. To use this array, you would iterate 20 times over the elements of arr and assign each one with some result of malloc():

arr[0] = (char *) malloc( sizeof(char*) * 256 );
arr[1] = (char *) malloc( sizeof(char*) * 256 );
...
arr[19] = (char *) malloc( sizeof(char*) * 256 );

Then you could use each of the 20 strings. To pass the second one to fgets, which expects a char * as its first argument, you would do this:

fgets( arr[1], ... );

Then fgets gets the char * it expects.

Be aware of course that you have to call malloc() before you attempt this or arr[1] would be uninitialized.

Your example using execvp() is correct (assuming you allocated all these strings with malloc() first. vector_arr[0] is a char **, which execvp() expects. [Remember also execvp() expects the last pointer of your vector array to have the value NULL, see the man page for clarification].

Note that execvp() is declared like so (see <unistd.h>)

int execvp(const char *file, char *const argv[]);

removing the const attribute for clarity, it could also have been declared like so:

int execvp( const char *file, char **argv );

The declaration of char **array being functionally equivalent to char *array[].

Remember also that in every example where we use malloc(), you'll have to at some point use a corresponding free() or you'll leak memory.

I'll also point out that, generally speaking, although you can do an array of vectors (and arrays of arrays of vectors and so on), as you extend your arrays more and more dimensionally you'll find the code gets harder and harder to understand and maintain. Of course you should learn how this all works and practice until you understand it fully, but if in the course of designing your code you find yourself thinking you need arrays of arrays of arrays you are probably overcomplicating things.

Upvotes: 3

Lee Duhem
Lee Duhem

Reputation: 15121

Here is a partly answer to the OP.

char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );

the arguments to fgets() is wrong, it should be fgets( line, 256, stdin );.

Explanation:

  1. fgets() expects its first argument a char *, so you can use a pointer to char or an array of char (this array name will degrade to char * in this case).

    When used as a argument to a function, an array name will degrade to a pointer.

  2. becuase line is a pointer, sizeof(line) will give you the size of a pointer (usually 4 in 32-bit system); but if line is an array, such as char line[100], sizeof(line) will give you the size of the array, in this case, 100 * sizeof(char).

    When used as an argument of sizeof operator, array name will not degrade to a pointer.

Upvotes: 2

Related Questions