StackedCrooked
StackedCrooked

Reputation: 35485

Dealing with char buffers

As a C++ programmer I sometimes need deal with memory buffers using techniques from C. For example:

char buffer[512];
sprintf(buffer, "Hello %s!", userName.c_str());

Or in Windows:

TCHAR buffer[MAX_PATH+1]; // edit: +1 added
::GetCurrentDirectory(sizeof(buffer)/sizeof(TCHAR), &buffer[0]);

The above sample is how I usually create local buffers (a local stack-allocated char array). However, there are many possible variations and so I'm very interested in your answers to the following questions:

Upvotes: 14

Views: 40451

Answers (9)

stinky472
stinky472

Reputation: 6797

I assume your interest comes about primarily from a performance perspective, since solutions like vector, string, wstring, etc. will generally work even for interacting with C APIs. I recommend learning how to use those and how to use them efficiently. If you really need it, you can even write your own memory allocator to make them super fast. If you are sure they're not what you need, there's still no excuse for you to not write a simple wrapper to handle these string buffers with RAII for the dynamic cases.

With that out of the way:

Is passing the buffer as &buffer[0] better programming style than passing buffer? (I prefer &buffer[0].)

No. I would consider this style to be slightly less useful (admittedly being subjective here) as you cannot use it to pass a null buffer and therefore would have to make exceptions to your style to pass pointers to arrays that can be null. It is required if you pass in data from std::vector to a C API expecting a pointer, however.

Is there a maximum size that is considered safe for stack allocated buffers?

This depends on your platform and compiler settings. Simple rule of thumb: if you're in doubt about whether your code will overflow the stack, write it in a way which can't.

Is a static buffer (static char buffer[N];) faster? Are there any other arguments for or against it?

Yes, there is a big argument against it, and that is that it makes your function no longer re-entrant. If your application becomes multithreaded, these functions will not be thread safe. Even in a single-threaded application, sharing the same buffer when these functions are recursively called can lead to problems.

What about using static char * buffer = new char[N]; and never deleting the buffer? (Reusing the same buffer each call.)

We still have the same problems with re-entrancy.

I understand that heap allocation should be used when (1) dealing with large buffers or (2) maximum buffer size is unknown at compile time. Are there any other factors that play in the stack/heap allocation decision?

Stack unwinding destroys objects on the stack. This is especially important for exception-safety. Thus even if you allocate memory on the heap within a function, it should generally be managed by an object on the stack (ex: smart pointer). ///@see RAII.

Should you prefer the sprintf_s, memcpy_s, ... variants? (Visual Studio has been trying to convince me of this for a long time, but I want a second opinion :p )

MS was right about these functions being safer alternatives since they don't have buffer overflow problems, but if you write such code just as is (without writing variants for other platforms), your code will be married to Microsoft since it will be non-portable.

When using static buffers you can use return type const char *. Is this (generally) a good or a bad idea? (I do realize that the caller will need to make his own copy to avoid that the next call would change the previous return value.)

I'd say in almost every case, you want to use const char* for return types for a function returning a pointer to a character buffer. For a function to return a mutable char* is generally confusing and problematic. Either it's returning an address to global/static data which it shouldn't be using in the first place (see re-entrancy above), local data of a class (if it's a method) in which case returning it ruins the class's ability to maintain invariants by allowing clients to tamper with it however they like (ex: stored string must always be valid), or returning memory that was specified by a pointer passed in to the function (the only case where one might reasonably argue that mutable char* should be returned).

Upvotes: 7

bta
bta

Reputation: 45057

1) buffer and &buffer[0] should be equivalent.

2) Stack size limits will depend on your platform. For most simple functions, my personal rule of thumb is anything over ~256KB is declared dynamically; there's no real rhyme or reason to that number, though, it's just my own convention and it's currently within the default stack sizes for all of the platforms I develop for.

3) Static buffers aren't faster or slower (for all intents and purposes). The only difference is the access control mechanism. The compiler generally places static data in a separate section of the binary file than non-static data, but there is no noticeable/significant performance benefit or penalty involved. The only real way to tell for sure is to write the program both ways and time them (since many of the speed aspects involved here are dependent on your platform/compiler).

4) Don't return a const pointer if the caller will need to modify it (that defeats the point of const). Use const for function parameters and return types if and only if they are not designed to be modified. If the caller will need to modify the value, your best bet is for the caller to pass the function a pointer to a pre-allocated buffer (along with the buffer size) and for the function to write the data into that buffer.

5) Reusing a buffer may lead to a performance improvement for larger buffers due to bypassing the overhead that is involved in calling malloc/free or new/delete each time. However, you run the risk of accidentally using old data if you forget to clear the buffer each time or you try to run two copies of the function in parallel. Again, the only real way to know for sure is to try both ways and measure how long the code takes to run.

6) Another factor in stack/heap allocation is scoping. A stack variable goes out of scope when the function it lives in returns, but a variable that was dynamically allocated on the heap can be returned to the caller safely or accessed the next time the function is called (a la strtok).

7) I would recommend against the use of sprintf_s, memcpy_s, and friends. They are not part of the standard library and are not portable. The more you use these functions, the more extra work you will have when you want to run your code on a different platform or use a different compiler.

Upvotes: 2

Mark Ransom
Mark Ransom

Reputation: 308196

If a function gives you a method of knowing how many characters it will return, use it. Your sample GetCurrentDirectory is a good example:

DWORD length = ::GetCurrentDirectory(0, NULL);

Then you can use a dynamically allocated array (either string or vector) to get the result:

std::vector<TCHAR> buffer(length, 0);
// assert(buffer.capacity() >= length);  // should always be true
GetCurrentDirectory(length, &buffer[0]);

Upvotes: 1

INS
INS

Reputation: 10820

Is passing the buffer as &buffer[0] better programming style than passing buffer? (I prefer &buffer[0].)

It depends on the coding standards. I personally prefer: buffer + index instead of &buffer[index] but it's a matter of taste.

Is there a maximum size that is considered safe for stack allocated buffers?

It depends on the stack size. If the amount of stack needed for your buffer exceeds the amount available on the stack, it result a stack-overflow.

Is static char buffer[N]; faster? Are there any other arguments for or against it?

Yes, it should be faster. See also this question: Is it bad practice to declare an array mid-function

When using static buffers you can have your function return have the const char * return type. Is this a good idea? (I do realize that the caller will need to make his own copy to avoid that the next call would change the previous return value.)

Not sure what static means in this case but:

  1. If variable is declared on stack( char buf[100] ): You should not return references to stuff that is declared on the stack. They will be trashed at next function call/declaration (e.g. when the stack is used again).

  2. If the variable is declared as static static it will make your code non-reentrant. strtok is an example in this case.

What about using static char * buffer = new char[N]; and never deleting the buffer? (Reusing the same buffer each call.)

It is a possibility, though not recommended because it makes your code non-reentrant.

I understand that heap allocation should be used when (1) dealing with large buffers or (2) maximum buffer size is unknown at compile time. Are there any other factors that play in the stack/heap allocation decision?

Stack size of the running thread is too small to fit stack declaration (previously mentioned).

Should you prefer the sprintf_s, memcpy_s, ... variants? (Visual Studio has been trying to convince me of this for a long time, but I want a second opinion :p )

If you want your code to be portable: No. But the effort in creating a portable macro is quite small in this case:

// this is not tested - it is just an example
#ifdef _WINDOWS
 #define SPRINTF sprintf_s
#else
 #define SPRINTF sprintf
#endif

Upvotes: 0

Josh Kelley
Josh Kelley

Reputation: 58352

Is passing the buffer as &buffer[0] better programming style than passing buffer? (I prefer &buffer[0].)

&buffer[0] makes the code less readable to me. I have to pause for a second and wonder why someone used it instead of just passing buffer. Sometimes you have to use &buffer[0] (if buffer is a std::vector), but otherwise, stick with the standard C style.

Is there a maximum size that is considered safe for stack allocated buffers?

I doubt there's any practical limit, as long as you're using the stack reasonably. I've never had any problems in my development.

If I'm reading MSDN correctly, threads on Windows default to 1MB of stack size. This is configurable. Other platforms have other limits.

Is static char buffer[N]; faster? Are there any other arguments for or against it?

On the one hand, it might reduce the need to commit memory pages for stack, so your app might run faster. On the other hand, going to the BSS segment or equivalent might reduce cache locality compared to the stack, so your app might run slower. I seriously doubt you'd notice the difference either way.

Using static is not threadsafe, while using the stack is. That's a huge advantage to the stack. (Even if you don't think you'll be multithreaded, why make life harder if that changes in the future?)

When using static buffers you can have your function return have the const char * return type. Is this a good idea? (I do realize that the caller will need to make his own copy to avoid that the next call would change the previous return value.)

Const correctness is always a good thing.

Returning pointers to static buffers is error-prone; a later call might modify it, another thread might modify it, etc. Use std::string instead or other auto-allocated memory instead (even if your function needs to deal internally with char buffers such as your GetCurrentDirectory example.)

What about using static char * buffer = new char[N]; and never deleting the buffer? (Reusing the same buffer each call.)

Less efficient than just using static char buffer[N], since you need a heap allocation.

I understand that heap allocation should be used when (1) dealing with large buffers or (2) maximum buffer size is unknown at compile time. Are there any other factors that play in the stack/heap allocation decision?

See Justin Ardini's answer.

Should you prefer the sprintf_s, memcpy_s, ... variants? (Visual Studio has been trying to convince me of this for a long time, but I want a second opinion :p )

This is a matter of some debate. Personally, I think these functions are a good idea, and if you're targeting Windows exclusively, then there's some benefit to taking the preferred Windows approach and using those functions. (And they're fairly straightforward to reimplement if you later need to target something other than Windows, as long as you don't rely on their error-handling behavior.) Others think that the Secure CRT functions are no more secure than properly used C and introduce other disadvantages; Wikipedia links to a few arguments against them.

Upvotes: 1

Patrick
Patrick

Reputation: 23619

  • Buffer or &Buffer[0] is exactly the same. You could even write Buffer+0. Personally I prefer just to write Buffer (and I think most developers also prefer this), but this is your personal choice
  • The maximum depends on how big and how deep your stack is. If you are already 100 functions deep in the stack, the maximum size will be smaller. If you can use C++, you could write a buffer class that dynamically chooses whether to use the stack (for small sizes) or the heap (for large sizes). You will find the code below.
  • A static buffer is faster since the compiler will reserve the space for you beforehand. A stack buffer is also fast. For a stack buffer the application just has to increase the stack pointer. For a heap buffer, the memory manager has to find free space, ask the operating system for new memory, afterwards free it again, do some bookkeeping, ...
  • If possible use C++ strings to avoid memory leaks. Otherwise, the caller has to know whether he has to free the memory afterwards or not. The downside is that C++ strings are slower than static buffers (since they are allocated on the heap).
  • I wouldn't use memory allocation on global variables. When are you going to delete it? And can you be sure that no other global variable will need the allocated memory (and be used before your static buffer is allocated)?
  • Whatever kind of buffer you use, try to hide the implementation from the caller of your function. You could try to hide the buffer-pointer in a class let the class remember whether the buffer is dynamically allocated or not (and thus should delete it in its destructor or not). Afterwards it's easy to change the type of the buffer, which you can't do if you just return a char-pointer.
  • Personally I prefer the normal sprintf variants, but that's probably because I still have lots of older code and I don't want a mixed situation. In any case, consider using snprintf, where you can pass the buffer size.

Code for dynamic stack/heap buffer:

template<size_t BUFSIZE,typename eltType=char>
class DynamicBuffer
   {
   private:
      const static size_t MAXSIZE=1000;
   public:
      DynamicBuffer() : m_pointer(0) {if (BUFSIZE>=MAXSIZE) m_pointer = new eltType[BUFSIZE];}
      ~DynamicBuffer() {if (BUFSIZE>=MAXSIZE) delete[] m_pointer;};
      operator eltType * () { return BUFSIZE>=MAXSIZE ? m_pointer : m_buffer; }
      operator const eltType * () const { return BUFSIZE>=MAXSIZE ? m_pointer : m_buffer; }
   private:
      eltType m_buffer[BUFSIZE<MAXSIZE?BUFSIZE:1];
      eltType *m_pointer;
   };

Upvotes: 0

Amardeep AC9MF
Amardeep AC9MF

Reputation: 19044

  1. Stay away from static buffers if you ever want to use your code re-entrantly.

  2. use snprintf() instead of sprintf() so you can control buffer overruns.

  3. You never know how much stack space is left in the context of your call -- so no size is technically 'safe'. You have a lot of headroom to play with most of the time. But that one time will get you good. I use a rule of thumb to never put arrays on the stack.

  4. Have the client own the buffer and pass it and its size to your function. That makes it re-entrant and leaves no ambiguity as to who needs to manage the life of the buffer.

  5. If you're dealing with string data, double check your string functions to make sure they terminate especially when they hit the end of the buffer. The C library is very inconsistent when it comes to handling string termination across the various functions.

Upvotes: 8

Justin Ardini
Justin Ardini

Reputation: 9866

You have a lot of questions! I'll do my best to answer a couple and give you a place to look for the others.

Is there a maximum size that is considered safe for stack allocated buffers?

Yes, but the stack size itself varies based on the platform you are working on. See When do you worry about stack size? for a very similar question.

Is static char buffer[N]; faster? Are there any other arguments for or against it?

The meaning of static is dependent on where the buffer is declared, but I assume you are talking about a static declared inside a function, so it is initialized only once. In functions called many times, using static buffers may be a good idea to prevent stack overflow, but otherwise, keep in mind that allocating buffers is a cheap operation. Also, static buffers are much harder to work with when dealing with multiple threads.

For answers to most of your other questions, see Large buffers vs Large static buffers, is there an advantage?.

Upvotes: 3

GManNickG
GManNickG

Reputation: 503865

  • It's up to you, just doing buffer is more terse but if it were a vector, you'd need to do &buffer[0] anyway.
  • Depends on your intended platform.
  • Does it matter? Have you determined it to be a problem? Write the code that's easiest to read and maintain before you go off worrying if you can obfuscate it into something faster. But for what it's worth, allocation on the stack is very fast (you just change the stack pointer value.)
  • You should be using std::string. If performance becomes a problem, you'd be able to reduce dynamic allocations by just returning the internal buffer. But the std::string return interface is way nicer and safer, and performance is your last concern.
  • That's a memory leak. Many will argue that's okay, since the OS free's it anyway, but I feel it terrible practice to just leak things. Use a static std::vector, you should never be doing any raw allocation! If you're putting yourself into a position where you might leak (because it needs to be done explicitly), you're doing it wrong.
  • I think your (1) and (2) just about cover it. Dynamic allocation is almost always slower than stack allocation, but you should be more concerned about which makes sense in your situation.
  • You shouldn't be using those at all. Use std::string, std::stringstream, std::copy, etc.

Upvotes: 4

Related Questions