Reputation: 4327
I am attempting to create a UNIX shell in C. If it were in Java, it would be a piece of cake, but I am not so experienced in C. Arrays in C confuse me a bit. I am not sure how to declare or access certain data structures.
I would like to create a string to read in each line. Easy enough: simply an array of characters. I would initialize it as follows:
char line[256]; //Maximum size of each line is 255 characters
And to access an element of this array, I would do as follows:
line[0] = 'a'; //Sets element 0 to 'a'
fgets( line, sizeof line, stdin ); //Gets a line from stdin and places it in line
How does declaring and using a string in this manner differ from declaring it as a pointer? From my understanding, an array in C decays to a pointer. So, would the following be equivalent?
char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );
When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?
char *arr[20]; // Declares an array of strings with 20 elements
And how would I access it?
arr[0] = "hello" // Sets element zero of arr to "hello"
Is this correct?
How would I pass this array to a function?
execvp("ls", arr); // Executes ls with argument vector arr
Is that correct, or would I use the pointer *arr? If so, why?
Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?
char **vector_arr[20]; // An array of arrays of strings
And how would I access an element of this array?
execvp("ls", vector_arr[0]); // Executes ls with first element of vector_arr as argument vector
I thought that I grasped a decent understanding of what a pointer is, and even how arrays relate to pointers, however I seem to be having trouble relating this to the actual code. I guess that when dealing with pointers, I don't know when to reference *var, var, or &var.
Upvotes: 6
Views: 4360
Reputation: 123468
Let's talk about expressions and types as they relate to arrays in C.
Arrays
When you declare an array like
char line[256];
the expression line
has type "256-element array of char
"; except when this expression is the operand of the sizeof
or unary &
operators, it will be converted ("decay") to an expression of type "pointer to char
", and the value of the expression will be the address of the first element of the array. Given the above declaration, all of the following are true:
Expression Type Decays to Equivalent value
---------- ---- --------- ----------------
line char [256] char * &line[0]
&line char (*)[256] n/a &line[0]
*line char n/a line[0]
line[i] char n/a n/a
&line[0] char * n/a n/a
sizeof line size_t n/a Total number of bytes
in array (256)
Note that the expressions line
, &line
, and &line[0]
all yield the same value (the address of the first element of the array is the same as the address of the array itself), it's just that the types are different. In the expression &line
, the array expression is the operand of the &
operator, so the conversion rule above doesn't apply; instead of a pointer to char
, we get a pointer to a 256-element array of char
. Type matters; if you write something like the following:
char line[256];
char *linep = line;
char (*linearrp)[256] = &line;
printf( "linep + 1 = %p\n", (void *) (linep + 1) );
printf( "linearrp + 1 = %p\n", (void *) (linearrp + 1) );
you'd get different output for each line; linep + 1
would give the address of the next char
following line
, while linearrp + 1
would give the address of the next 256-element array of char
following line
.
The expression line
is not an modifiable lvalue; you cannot assign to it, so something like
char temp[256];
...
line = temp;
would be illegal. No storage is set aside for a variable line
separate from line[0]
through line[256]
; there's nothing to assign to.
Because of this, when you pass an array expression to a function, what the function receives is a pointer value, not an array. In the context of a function parameter declaration, T a[N]
and T a[]
are interpreted as T *a
; all three declare a
as a pointer to T
. The "array-ness" of the parameter has been lost in the course of the call.
All array accesses are done in terms of pointer arithmetic; the expression a[i]
is evaluated as *(a + i)
. The array expression a
is first converted to an expression of pointer type as per the rule above, then we offset i
elements from that address and dereference the result.
Unlike Java, C does not set aside storage for a pointer to the array separate from the array elements themselves: all that's set aside is the following:
+---+
| | line[0]
+---+
| | line[1]
+---+
...
+---+
| | line[255]
+---+
Nor does C allocate memory for arrays from the heap (for whatever definition of heap). If the array is declared auto
(that is, local to a block and without the static
keyword), the memory will be allocated from wherever the implementation gets memory for local variables (what most of us call the stack). If the array is declared at file scope or with the static
keyword, the memory will be allocated from a different memory segment, and it will be allocated at program start and held until the program terminates.
Also unlike Java, C arrays contain no metadata about their length; C assumes you knew how big the array was when you allocated it, so you can track that information yourself.
Pointers
When you declare a pointer like
char *line;
the expression line
has type "pointer to char
" (duh). Enough storage is set aside to store the address of a char
object. Unless you declare it at file scope or with the static
keyword, it won't be initialized and will contain some random bit pattern that may or may not correspond to a valid address. Given the above declaration, all of the following are true:
Expression Type Decays to Equivalent value
---------- ---- --------- ----------------
line char * n/a n/a
&line char ** n/a n/a
*line char n/a line[0]
line[i] char n/a n/a
&line[0] char * n/a n/a
sizeof line size_t n/a Total number of bytes
in a char pointer
(anywhere from 2 to
8 depending on the
platform)
In this case, line
and &line
do give us different values, as well as different types; line
is a simple scalar object, so &line
gives us the address of that object. Again, array accesses are done in terms of pointer arithmetic, so line[i]
works the same whether line is declared as an array or as a pointer.
So when you write
char *line = malloc( sizeof *line * 256 ); // note no cast, sizeof expression
this is the case that works like Java; you have a separate pointer variable that references storage that's allocated from the heap, like so:
+---+
| | line -------+
+---+ |
... |
+---+ |
| | line[0] <---+
+---+
| | line[1]
+---+
...
+---+
| | line[255]
+---+
Unlike Java, C won't automatically reclaim this memory when there are no more references to it. You'll have to explicitly deallocate it when you're finished with it using the free
library function:
free( line );
As for your specific questions:
fgets( *line, sizeof(line), stdin );
When do you use the pointer character '*', and when don't you? In the example above, is including the '*' in fgets necessary, or correct?
It is not correct; fgets
expects the first argument to have type "pointer to char
"; the expression *line
has type char
. This follows from the declaration:
char *line;
Secondly, sizeof(line)
only gives you the size of the pointer, not the size of what the pointer points to; unless you want to read exactly sizeof (char *)
bytes, you'll have to use a different expression to specify the number of characters to read:
fgets( line, 256, stdin );
Now, I would like to create an array of strings, or rather, an array of pointers which point to strings. Would I do so as follows?char *arr[20]; // Declares an array of strings with 20 elements
C doesn't have a separate "string" datatype the way C++ or Java do; in C, a string is simply a sequence of character values terminated by a 0. They are stored as arrays of char
. Note that all you've declared above is a 20-element array of pointers to char
; those pointers can point to things that aren't strings.
If all of your strings are going to have the same maximum length, you can declare a 2D array of char
like so:
char arr[NUM_STRINGS][MAX_STRING_LENGTH + 1]; // +1 for 0 terminator
and then you would assign each string as
strcpy( arr[i], "some string" );
strcpy( arr[j], some_other_variable );
strncpy( arr[k], MAX_STRING_LENGTH, another_string_variable );
although beware of strncpy
; it won't automatically append the 0 terminator to the destination string if the source string was longer than the destination. You'll have to make sure the terminator is present before trying to use it with the rest of the string library.
If you want to allocate space for each string separately, you can declare the array of pointers, then allocate each pointer:
char *arr[NUM_STRINGS];
...
arr[i] = malloc( strlen("some string") + 1 );
strcpy( arr[i], "some string" );
...
arr[j] = strdup( "some string" ); // not available in all implementations, calls
// malloc under the hood
...
arr[k] = "some string"; // arr[k] contains the address of the *string literal*
// "some string"; note that you may not modify the contents
// of a string literal (the behavior is undefined), so
// arr[k] should not be used as an argument to any function
// that tries to modify the input parameter.
Note that each element of arr
is a pointer value; whether these pointers point to strings (0-terminated sequences of char
) or not is up to you.
Now even worse, I would like an array of arrays of strings (for example, if I wanted to hold multiple argument vectors, in order to execute multiple commands in pipe sequence). Would it be declared as follows?char **vector_arr[20]; // An array of arrays of strings
What you've declared is an array of pointers to pointers to char; note that this is perfectly valid if you don't know how many pointers to char
you need to store in each element. However, if you know the maximum number of arguments per element, it may be clearer to write
char *vector_arr[20][N];
Otherwise, you'd have to allocate each array of char *
dynamically:
char **vector_arr[20] = { NULL }; // initialize all the pointers to NULL
for ( i = 0; i < 20; i++ )
{
// the type of the expression vector_arr is 20-element array of char **, so
// the type of the expression vector_arr[i] is char **, so
// the type of the expression *vector_arr[i] is char *, so
// the type of the expression vector[i][j] is char *, so
// the type of the expression *vector_arr[i][j] is char
vector_arr[i] = malloc( sizeof *vector_arr[i] * num_args_for_this_element );
if ( vector_arr[i] )
{
for ( j = 0; j < num_args_for_this_element )
{
vector_arr[i][j] = malloc( sizeof *vector_arr[i][j] * (size_of_this_element + 1) );
// assign the argument
strcpy( vector_arr[i][j], argument_for_this_element );
}
}
}
So, each element of vector_arr
is an N-element array of pointers to M-element arrays of char
.
Upvotes: 7
Reputation: 17724
You're really on the right track.
In your second example, where you use malloc()
, the fgets()
command would be called like so:
fgets( line, sizeof(line), stdin ); /* vs. fgets( *line ... ) as you have */
The reason for this is that in C a named array variable is always just a pointer. So:
char line[256];
declares (and defines) a pointer called line
that points to 256 bytes of memory allocated at compile time (probably on the stack).
char *line;
also declares a pointer, but the memory it points to is not assigned by the compiler. When you call malloc
you typecast the return value to char *
and assign it to line
so the memory is allocated dynamically on the heap.
Functionally though, the variable line
is just a char *
(pointer to char) and if you look at the declaration of fgets
in the <stdio.h>
file, you'll see what it expects as its first argument:
char *fgets(char * restrict str, int size, FILE * restrict stream);
... namely a char *
. So you could pass line
either way you declared it (as a pointer or as an array).
With respect to your other questions:
char *arr[20];
declares 20 uninitialized pointers to char *
. To use this array, you would iterate 20 times over the elements of arr
and assign each one with some result of malloc()
:
arr[0] = (char *) malloc( sizeof(char*) * 256 );
arr[1] = (char *) malloc( sizeof(char*) * 256 );
...
arr[19] = (char *) malloc( sizeof(char*) * 256 );
Then you could use each of the 20 strings. To pass the second one to fgets
, which expects a char *
as its first argument, you would do this:
fgets( arr[1], ... );
Then fgets
gets the char *
it expects.
Be aware of course that you have to call malloc()
before you attempt this or arr[1]
would be uninitialized.
Your example using execvp() is correct (assuming you allocated all these strings with malloc()
first. vector_arr[0]
is a char **, which execvp()
expects. [Remember also execvp() expects the last pointer of your vector array to have the value NULL, see the man page for clarification].
Note that execvp()
is declared like so (see <unistd.h>
)
int execvp(const char *file, char *const argv[]);
removing the const
attribute for clarity, it could also have been declared like so:
int execvp( const char *file, char **argv );
The declaration of char **array
being functionally equivalent to char *array[]
.
Remember also that in every example where we use malloc()
, you'll have to at some point use a corresponding free()
or you'll leak memory.
I'll also point out that, generally speaking, although you can do an array of vectors (and arrays of arrays of vectors and so on), as you extend your arrays more and more dimensionally you'll find the code gets harder and harder to understand and maintain. Of course you should learn how this all works and practice until you understand it fully, but if in the course of designing your code you find yourself thinking you need arrays of arrays of arrays you are probably overcomplicating things.
Upvotes: 3
Reputation: 15121
Here is a partly answer to the OP.
char *line = (char*) malloc( sizeof(char) * 256 );
line[0] = 'a';
fgets( *line, sizeof(line), stdin );
the arguments to fgets()
is wrong, it should be fgets( line, 256, stdin );
.
Explanation:
fgets()
expects its first argument a char *
, so you can use a pointer to char
or an array of char
(this array name will degrade to char *
in this case).
When used as a argument to a function, an array name will degrade to a pointer.
becuase line
is a pointer, sizeof(line)
will give you the size of a pointer (usually 4 in 32-bit system); but if line
is an array, such as char line[100]
, sizeof(line)
will give you the size of the array, in this case, 100 * sizeof(char).
When used as an argument of sizeof
operator, array name will not degrade to a pointer.
Upvotes: 2