Reputation: 1201
I'm trying to parse argv
back into a single string to use for a system call.
The string shows up fine through the printf
call (even without the \0
terminator), but using it as a parameter to system creates all sorts of undefined behaviour.
How can I ensure the string is properly terminated?
Is there a better and more reliable way to go about parsing char[][]
into char[]
?
#include <stdio.h>
int main(int argc,char *argv[]){
char cmd[255]="tcc\\tcc.exe ";
char**ptr=argv+1;
while(*ptr){
strcat(cmd,*ptr);
++ptr;
if(*ptr)cmd[strlen(cmd)]=' ';
}
printf("cmd: ***%s***\n",cmd);
system(cmd);
}
I just discovered and corrected another flaw in this code, a system commands needs (escaped) backslashes for file paths
Upvotes: 0
Views: 867
Reputation: 57453
This instruction:
if(*ptr)
cmd[strlen(cmd)]=' ';
else
cmd[strlen(cmd)]='\0';
will break cmd
, because it will overwrite its zero termination. Try instead:
l = strlen(cmd);
if (*ptr) {
cmd[l++] = ' ';
}
cmd[l] = 0x0;
This will append a space, and zero terminate the string. Actually, since it is already zero terminated, you could do better:
if (*ptr) {
int l = strlen(cmd);
cmd[l++] = ' ';
cmd[l ] = 0x0;
}
A better alternative could be this:
int main(int argc, char *argv[])
{
char cmd[255]="tcc/tcc.exe";
char **ptr=argv+1;
for (ptr = argv+1; *ptr; ptr++)
{
strncat(cmd, " ", sizeof(cmd)-strlen(cmd));
strncat(cmd, *ptr, sizeof(cmd)-strlen(cmd));
}
printf("String: '%s'.\n", cmd);
return 0;
}
We use strncat()
to check that we're not overrunning the cmd
buffer, and the space gets applied in advance. This way there's no extra space at the end of the string.
It is true that strncat()
is a mite slower than directly assigning cmd[]
, but factoring the safety and debugging time, I think it's worthwhile.
OK, so let's try to do this fast. We keep track of what cmd
's length ought to be in a variable, and copy the string with memcpy()
which is slightly faster than strcpy()
and does neither check string length, nor copy the extra zero at end of string.
(This saves something - remember that strcat()
has to implicitly calculate the strlen
of both its arguments. Here we save that).
int main(int argc, char *argv[])
{
#define MAXCMD 255
char cmd[MAXCMD]="tcc/tcc.exe";
int cmdlen = strlen(cmd);
char **ptr=argv+1;
for (ptr = argv+1; *ptr; ptr++)
{
/* How many bytes do we have to copy? */
int l = strlen(*ptr);
/* STILL, this check HAS to be done, or the program is going to crash */
if (cmdlen + 1 + l + 1 < MAXCMD)
{
/* No danger of crashing */
cmd[cmdlen++] = ' ';
memcpy(cmd + cmdlen, *ptr, l);
cmdlen += l;
}
else
{
printf("Buffer too small!\n");
}
}
cmd[cmdlen] = 0x0;
printf("String: '%s'.\n", cmd);
return 0;
}
It is possible to try and be smarter than the compiler's usually built-in strlen
and memcpy
instructions (file under: "Bad ideas"), and do without strlen() altogether. This translates into a smaller inner loop, and when strlen
and memcpy
are implemented with library calls, much faster performances (look ma, no stack frames!).
int main(int argc, char *argv[])
{
#define MAXCMD 254
char cmd[MAXCMD+1]="tcc/tcc.exe";
int cmdlen = 11; // We know initial length of "tcc/tcc.exe"!
char **ptr;
for (ptr = argv+1; *ptr; ptr++)
{
cmd[cmdlen++] = ' ';
while(**ptr) {
cmd[cmdlen++] = *(*ptr)++;
if (MAXCMD == cmdlen)
{
fprintf(stderr, "BUFFER OVERFLOW!\n");
return -1;
}
}
}
cmd[cmdlen] = 0x0;
printf("String: '%s'.\n", cmd);
return 0;
}
Shamelessly cribbed from many a lecture I received from professors I thought shortsighted, until they were proved right each and every time.
The problem here is to exactly circumscribe what are we doing - what's the forest this particular tree is part of.
We're building a command line that will be fed to a exec()
call, which means that the OS will have to build another process environment and allocate and track resources. Let's step a bit backwards: an operation will be run that will take about one millisecond, and we're feeding it a loop that might take ten microseconds instead of twenty.
The 20:10 (that's 50%!) improvement we have on the inner loop translates in a 1020:1010 (that's about 1%) just the overall process startup operation. Let's imagine the process takes half a second - five hundred milliseconds - to complete, and we're looking at 500020:500010 or 0.002% improvement, in accord with the never-sufficiently-remembered http://en.wikipedia.org/wiki/Amdahl%27s_law .
Or let's put it another way. One year hence, we will have run this program, say, one billion times. Those 10 microseconds saved now translate to a whopping 10.000 seconds, or around two hours and three quarters. We're starting to talk big, except that to obtain this result we've expended sixteen hours coding, checking and debugging :-)
The double-strncat()
solution (which is actually the slowest) supplies code that is easier to read and understand, and modify. And reuse. The fastest solution, above, implicitly relies on the separator being one character, and this fact is not immediately apparent. Which means that reusing the fastest solution with ", " as separator (let's say we need this for CSV or SQL) will now introduce a subtle bug.
When designing an algorithm or piece of code, it is wise to factor not only tightness of code and local ("keyhole") performances, but also things like:
This in answer to WilliamMorris's question, so I'll use his code, but mine has the same problem (actually, mine is - not completely unintentionally - much worse).
This is the original functionality from William's code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
#define CMD "tcc/tcc.exe"
char cmd[255] = CMD;
char *s = cmd + sizeof CMD - 1;
const char *end = cmd + sizeof cmd - 1;
// Cycle syntax modified to pre-C99 - no consequences on our code
int i;
for (i = 1; i < argc; ++i) {
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
// Here (will) be dragons
//*s++ = '.';
//*s++ = '.';
//*s++ = '.';
*s++ = ' ';
memcpy(s, argv[i], len);
s += len;
}
*s = '\0';
// Get also string length, which should be at most 254
printf("%s: string length is %d\n", cmd, (int)strlen(cmd));
return 0;
}
The buffer overrun check verifies that the string written so far, plus the string that has yet to be written, together do not exceed the buffer. The length of the separator itself is not counted, but things will work out somehow:
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
Now we add on the separator in the most expeditious way - by repeating the poke:
*s++ = ', ';
*s++ = ' ';
Now if s + len
is equal to end - 1
, the check will pass. We now add two bytes. The total length will be s + len + 2, which is equal to end plus one:
tcc/tcc.exe, It, was, the, best, of, times, it, was, the, worst, of, times, it, was, the, age, of, wisdom, it, was, the, age, of, foolishness, it, was, the, epoch, of, belief, it, was, the, epoch, of, incredulity, it, was, the, season, of, Light, it, was: string length is 254
tcc/tcc.exe, It, was, the, best, of, times, it, was, the, worst, of, times, it, was, the, age, of, wisdom, it, was, the, age, of, foolishness, it, was, the, epoch, of, belief, it, was, the, epoch, of, incredulity, it, was, the, season, of, Light, it, ouch: string length is 255
With a longer separator, such as "... ", the problem is even more evident:
tcc/tcc.exe... It... was... the... best... of... times... it... was... the... worst... of... times... it... was... the... age... of... wisdom... it... was... the... age... of... foolishness... it... was... the... epoch... of... belief... it... was... longer: string length is 257
In my version, the fact that the check requires an exact match leads to catastrophic results, since once the buffer is overrun, the match will always fail and result in a massive memory overwrite.
If we modify my version with
if (cmdlen >= MAXCMD)
we will get a code that always intercepts buffer overruns, but still does not prevent them up to the delimiter's length minus two; i.e., a hypothetical delimiter 20 bytes long could overwrite 18 bytes past cmd
's buffer before being caught.
I would point out that this is not to say that my code had a catastrophic bug (and so, once fixed, it'll live happy ever after); the point was that the code was structured in such a way that, for the sake of squeezing a little speed, a dangerous bug could easily go unnoticed, or the same bug could easily be introduced upon reuse of what looked like "safe and tested" code. This is a situation that one would be well advised to avoid.
(I'll come clean now, and confess that I myself rarely did... and too often still don't).
Upvotes: 3
Reputation: 3684
This might be a more complicated than you want but it avoids buffer overflows and it also exits if the buffer is too small. Note that continuing to loop once there is not enough space in the buffer for argv[N]
can result in trailing strings (argv[N+1]
etc) that are shorter than argv[N]
being added to the string even though argv[N]
was omitted...
Note that I'm using memcpy
, because by that point I already know how long argv[i]
is.
int main(int argc, char **argv)
{
#define CMD "tcc/tcc.exe"
char cmd[255] = CMD;
char *s = cmd + sizeof CMD - 1;
const char *end = cmd + sizeof cmd - 1;
for (int i = 1; i < argc; ++i) {
size_t len = strlen(argv[i]);
if (s + len >= end) {
fprintf(stderr, "Buffer overrun!\n");
exit(1);
}
*s++ = ' ';
memcpy(s, argv[i], len);
s += len;
}
*s = '\0';
printf("%s\n", cmd);
return 0;
}
Upvotes: 1