Bill the Lizard
Bill the Lizard

Reputation: 405955

What are some of the drawbacks to using C-style strings?

I know that buffer overruns are one potential hazard to using C-style strings (char arrays). If I know my data will fit in my buffer, is it okay to use them anyway? Are there other drawbacks inherent to C-style strings that I need to be aware of?

EDIT: Here's an example close to what I'm working on:

char buffer[1024];
char * line = NULL;
while ((line = fgets(fp)) != NULL) { // this won't compile, but that's not the issue
    // parse one line of command output here.
}

This code is taking data from a FILE pointer that was created using a popen("df") command. I'm trying to run Linux commands and parse their output to get information about the operating system. Is there anything wrong (or dangerous) with setting the buffer to some arbitrary size this way?

Upvotes: 8

Views: 4051

Answers (16)

Walter Bright
Walter Bright

Reputation: 4307

There are a few disadvantages to C strings:

  1. Getting the length is a relatively expensive operation.
  2. No embedded nul characters are allowed.
  3. The signed-ness of chars is implementation defined.
  4. The character set is implementation defined.
  5. The size of the char type is implementation defined.
  6. Have to keep track separately of how each string is allocated and so how it must be free'd, or even if it needs to be free'd at all.
  7. No way to refer to a slice of the string as another string.
  8. Strings are not immutable, meaning they must be synchronized separately.
  9. Strings cannot be manipulated at compile time.
  10. Switch cases cannot be strings.
  11. The C preprocessor does not recognize strings in expressions.
  12. Cannot pass strings as template arguments (C++).

Upvotes: 21

JohnMcG
JohnMcG

Reputation: 8815

Another consideration is who will be maintaining your code? What about in two years? Will that person be as comfortable with C-stlye strings as you are? As the STL gets more mature, it seems like people will be increasingly more comfortable with with STL strings than with C-style strings.

Upvotes: 0

Hoffmann
Hoffmann

Reputation: 14729

No Unicode support is reason enough these days...

Upvotes: 2

Eclipse
Eclipse

Reputation: 45533

In your specific case, it's not the c-string that dangerous, so much as the reading an indeterminate amount of data into a fixed-size buffer. Don't ever use gets(char*) for example.

Looking at your example though, it doesn't seem at all correct - try this:

char buffer[1024];
char * line = NULL;
while ((line = fgets(buffer, sizeof(buffer), fp)) != NULL) {
    // parse one line of command output here.
}

This is a perfectly safe use of c-strings, although you'll have to deal with the possibility that line does not contain an entire line, but was rather truncated to 1023 characters (plus a null terminator).

Upvotes: 3

efotinis
efotinis

Reputation: 14961

C strings lack the following aspects of their C++ counterparts:

  • Automatic memory management: you have to allocate and free their memory manually.
  • Extra capacity for concatenation efficiency: C++ strings often have a capacity greater than their size. This allows increasing the size without many reallocations.
  • No embedded NULs: by definition a NUL character ends a C string; C++ string keep an internal size counter so they don't need a special value to mark their end.
  • Sensible comparison and assignment operators: even though comparison of C string pointers is permitted, it's almost always not what was intended. Similarly, assigning C string pointers (or passing them to functions) creates ownership ambiguities.

Upvotes: 17

John Dibling
John Dibling

Reputation: 101476

You may know that today 1024 bytes is enough to contain any input, but you don't know how things will change tomorrow or next year.

If premature optimization is the root of all evil, magic numbers are the stem.

Upvotes: 8

ConcernedOfTunbridgeWells
ConcernedOfTunbridgeWells

Reputation: 66662

C strings, like many other aspects of C, give you plenty of room to hang yourself. They are simple and fast, but unsafe in the situation where assumptions such as the null terminator can be violated or input can overrun the buffer. To do them reliably you have to observe fairly hygenic coding practices.

There used to be a saying that the canonical definition of a high-level language was "anything with better string handling than C".

Upvotes: 0

Brian C. Lane
Brian C. Lane

Reputation: 4171

Well, to comment on your specific example, you don't know that the data returned by your call to df will fit into your buffer. Never trust un-sanatized input into your application, even when it is supposedly from a known source like df.

For example, if a program named 'df' is placed somewhere in your search path so that it is executed instead of the system df it could be used to exploit your buffer limit. Or if df is replaced by a malicious program.

When reading input from a file use a function that lets you specify the maximum number of bytes to read. Under OSX and Linux fgets() is actually defined as char *fgets(char *s, int size, FILE *stream); so it would be safe to use on those systems.

Upvotes: 6

Ilya
Ilya

Reputation: 3138

This question is not really have an answer.
If you writing in C what over options you have ?
If you writing in C++ why are you asking ? What is the reason not to use C++ primitives ?
The only reason i can think is: Linking C and C++ code and have char * somewhere in interfaces. It sometimes just easy to use char * instead doing conversion back and forward all the time (especially if it's really 'good' C++ code that have 3 different C++ string objects types).

Upvotes: 0

quinmars
quinmars

Reputation: 11573

Imho, the hardest point of cstrings is the memory management, because you need to be carefully if you need to pass a copy of a cstring or if you can pass a literal to a function, ie. will the function free the passed string or will it keep a reference longer then for the function call. The same applies to cstring return values.

So without big effort it is not possible to share cstring copys. This ends in many cases with unnecessary copiess of the same cstring in the memory.

Upvotes: 0

Will Dean
Will Dean

Reputation: 39520

Not having the length accessible in constant-time is a serious overhead in many applications.

Upvotes: 14

Paul Kapustin
Paul Kapustin

Reputation: 3295

I think IT IS OKAY to use them, people've been using them for years. But I would rather use std::string if possible because 1) you don't have to be so cautious every time and can think about problems of your domain, instead of thinking that you need to add another parameter every time...memory management and that kinda stuff...it is just safer to code on a higher level... 2) there are probably some other small concerns which are not big deal but still...like people already mentioned...encoding, unicode...all those "related" kinda stuff people creating std::string thought of...:)

Update

I worked on a project for half a year. Somehow I was stupid enough to never compile in release mode before delivery....:) Well...luckily there was just one error I found after 3 hours. It was a very simple string buffer overrun.

Upvotes: 2

EvilTeach
EvilTeach

Reputation: 28872

c strings have opportunities for misuse, due to the fact that that one has to scan the string to determine where it ends.

strlen - to find the length, scan the string, until you hit the NUL, or access protected memory

strcat - has to scan to find the NUL, in order to determine where to begin concatenating. There is no knowledge within a c string, to tell if there will be a buffer overrun or not.

c strings are risky, but generally faster than string objects.

Upvotes: 0

Tomalak
Tomalak

Reputation: 338316

There is no way to embed NUL characters (if you need them for something) into C style strings.

Upvotes: 6

activout.se
activout.se

Reputation: 6116

The memory management etc needed to grow string (char array), if necessary, is kinda boring to reinvent.

Upvotes: 7

Tomalak
Tomalak

Reputation: 338316

Character encoding issues tend to surface when you have an array of bytes instead of a string of characters.

Upvotes: 3

Related Questions