Reputation: 323
Understandably, going over a buffer errors out (or creates an overflow), but what happens if there are less than 12 bytes used in a 12 byte buffer? Is it possible or does the empty trailing always fill with 0s? Orthogonal question that may help: what is contained in a buffer when it is instantiated but not used by the application yet?
I have looked at a few pet programs in Visual Studio and it seems that they are appended with 0s (or null characters) but I am not sure if this is a MS implementation that may vary across language/ compiler.
Upvotes: 28
Views: 4872
Reputation: 2336
C++ has storage classes including global, automatic and static. The initialization depends on how the variable is declared.
char global[12]; // all 0
static char s_global[12]; // all 0
void foo()
{
static char s_local[12]; // all 0
char local[12]; // automatic storage variables are uninitialized, accessing before initialization is undefined behavior
}
Some interesting details here.
Upvotes: 11
Reputation: 104579
Take the following example (within a block of code, not global):
char data[12];
memcpy(data, "Selbie", 6);
Or even this example:
char* data = new char[12];
memcpy(data, "Selbie", 6);
In both of the above cases, the first 6 bytes of data
are S
,e
,l
,b
,i
, and e
. The remaining 6 bytes of data
are considered "unspecified" (could be anything).
Is it possible or does the empty trailing always fill with 0s?
Not guaranteed at all. The only allocator that I know of that guarantees zero byte fill is calloc. Example:
char* data = calloc(12,1); // will allocate an array of 12 bytes and zero-init each byte
memcpy(data, "Selbie");
what is contained in a buffer when it is instantiated but not used by the application yet?
Technically, as per the most recent C++ standards, the bytes delivered by the allocator are technically considered "unspecified". You should assume that it's garbage data (anything). Make no assumptions about the content.
Debug builds with Visual Studio will often initialize buffers with with 0xcc
or 0xcd
values, but that is not the case in release builds. There are however compiler flags and memory allocation techniques for Windows and Visual Studio where you can guaranteed zero-init memory allocations, but it is not portable.
Upvotes: 18
Reputation: 81277
Declared objects of static duration (those declared outside a function, or with a static
qualifier) which have no specified initializer are initialized to whatever value would be represented by a literal zero [i.e. an integer zero, floating-point zero, or null pointer, as appropriate, or a structure or union containing such values]. If the declaration of any object (including those of automatic duration) includes an initializer, portions whose values are specified by that initializer will be set as specified, and the remainder will be zeroed as with static objects.
For automatic objects without initializers, the situation is somewhat more ambiguous. Given something like:
#include <string.h>
unsigned char static1[5], static2[5];
void test(void)
{
unsigned char temp[5];
strcpy(temp, "Hey");
memcpy(static1, temp, 5);
memcpy(static2, temp, 5);
}
the Standard is clear that test
would not invoke Undefined Behavior, even though it copies portions of temp
that were not initialized. The text of the Standard, at least as of C11, is unclear as to whether anything is guaranteed about the values of static1[4]
and static2[4]
, most notably whether they might be left holding different values. A defect report states that the Standard was not intended to forbid a compiler from behaving as though the code had been:
unsigned char static1[5]={1,1,1,1,1}, static2[5]={2,2,2,2,2};
void test(void)
{
unsigned char temp[4];
strcpy(temp, "Hey");
memcpy(static1, temp, 4);
memcpy(static2, temp, 4);
}
which could leave static1[4]
and static2[4]
holding different values. The Standard is silent on whether quality compilers intended for various purposes should behave in that function. The Standard also offers no guidance as to how the function should be written if the intention if the programmer requires that static1[4]
and static2[4]
hold the same value, but doesn't care what that value is.
Upvotes: 1
Reputation: 11
I think the correct answer is that you should always keep track of how many char are written. As with the low level functions like read and write need or give the number of character read or writen. In the same way std::string keep tracks of the number of characters in its implementatiin
Upvotes: 1
Reputation: 4155
It depends on the storage class specifier, your implementation, and its settings.
Some interesting examples:
- Uninitialized stack variables may be set to 0xCCCCCCCC
- Uninitialized heap variables may be set to 0xCDCDCDCD
- Uninitialized static or global variables may be set to 0x00000000
- or it could be garbage.
It's risky to make any assumptions about any of this.
Upvotes: 1
Reputation: 4045
All of the previous answers are very good and very detailed, but the OP appears to be new to C programming. So, I thought a Real World example might be helpful.
Imagine you have a cardboard beverage holder that can hold six bottles. It's been sitting around in your garage so instead of six bottles, it contains various unsavory things that accumulate in the corners of garages: spiders, mouse houses, et al.
A computer buffer is a bit like this just after you allocate it. You can't really be sure what's in it, you just know how big it is.
Now, let's say you put four bottles in your holder. Your holder hasn't changed size, but you now know what's in four of the spaces. The other two spaces, complete with their questionable contents, are still there.
Computer buffers are the same way. That's why you frequently see a bufferSize variable to track how much of the buffer is in use. A better name might be numberOfBytesUsedInMyBuffer but programmers tend to be maddeningly terse.
Upvotes: 3
Reputation: 5421
Consider your buffer, filled with zeroes:
[00][00][00][00][00][00][00][00][00][00][00][00]
Now, let's write 10 bytes to it. Values incrementing from 1:
[01][02][03][04][05][06][07][08][09][10][00][00]
And now again, this time, 4 times 0xFF:
[FF][FF][FF][FF][05][06][07][08][09][10][00][00]
what happens if there are less than 12 bytes used in a 12 byte buffer? Is it possible or does the empty trailing always fill with 0s?
You write as much as you want, the remaining bytes are left unchanged.
Orthogonal question that may help: what is contained in a buffer when it is instantiated but not used by the application yet?
Unspecified. Expect junk left by programs (or other parts of your program) that used this memory before.
I have looked at a few pet programs in Visual Studio and it seems that they are appended with 0s (or null characters) but I am not sure if this is a MS implementation that may vary across language/ compiler.
It is exactly what you think it is. Somebody had done that for you this time, but there are no guarantees it will happen again. It could be a compiler flag that attaches cleaning code. Some versions of MSVC used to fill fresh memory with 0xCD when ran in debug but not in release. It can also be a system security feature that wipes memory before giving it to your process (so you can't spy on other apps). Always remember to use memset
to initialize your buffer where it matters. Eventually, mandate using certain compiler flag in readme if you depend on fresh buffer to contain a certain value.
But cleaning is not really necessary. You take a 12 byte-long buffer. You fill it with 7 bytes. You then pass it somewhere - and you say "here is 7 bytes for you". The size of the buffer is not relevant when reading from it. You expect other functions to read as much as you've written, not as much as possible. In fact, in C it is usually not possible to tell how long the buffer is.
And a side note:
Understandably, going over a buffer errors out (or creates an overflow)
It doesn't, that's the problem. That's why it's a huge security issue: there is no error and the program tries to continue, so it sometimes executes the malicious content it never meant to. So we had to add bunch of mechanisms to the OS, like ASLR that will increase probability of a crashing the program and decrease probability of it continuing with corrupted memory. So, never depend on those afterthought guards and watch your buffer boundaries yourself.
Upvotes: 11
Reputation: 48033
In general, it's not at all unusual for buffers to be underfull. It's often good practice to allocate buffers bigger than they need to be. (Trying to always compute an exact buffer size is a frequent source of error, and often a waste of time.)
When a buffer is bigger than it needs to be, when the buffer contains less data than its allocated size, it's obviously important to keep track of how much data is there. In general there are two ways of doing this: (1) with an explicit count, kept in a separate variable, or (2) with a "sentinel" value, such as the \0
character which marks the end of a string in C.
But then there's the question, if not all of a buffer is in use, what do the unused entries contain?
One answer is, of course, that it doesn't matter. That's what "unused" means. You care about the values of the entries that are used, that are accounted for by your count or your sentinel value. You don't care about the unused values.
There are basically four situations in which you can predict the initial values of the unused entries in a buffer:
When you allocate an array (including a character array) with static
duration, all unused entries are initialized to 0.
When you allocate an array and give it an explicit initializer, all unused entries are initialized to 0.
When you call calloc
, the allocated memory is initialized to all-bits-0.
When you call strncpy
, the destination string is padded out to size n
with \0
characters.
In all other cases, the unused parts of a buffer are unpredictable, and generally contain whatever they did last time (whatever that means). In particular, you cannot predict the contents of an uninitialized array with automatic duration (that is, one that's local to a function and isn't declared with static
), and you cannot predict the contents of memory obtained with malloc
. (Some of the time, in those two cases the memory tends to start out as all-bits-zero the first time, but you definitely don't want to ever depend on this.)
Upvotes: 1
Reputation: 27924
The program knows the length of a string because it ends it with a null-terminator, a character of value zero.
This is why in order to fit a string in a buffer, the buffer has to be at least 1 character longer than the number of characters in the string, so that it can fit the string plus the null-terminator too.
Any space after that in the buffer is left untouched. If there was data there previously, it is still there. This is what we call garbage.
It is wrong to assume this space is zero-filled just because you haven't used it yet, you don't know what that particular memory space was used for before your program got to that point. Uninitialized memory should be handled as if what is in it is random and unreliable.
Upvotes: 4
Reputation: 26117
Writing part of a buffer will not affect the unwritten part of the buffer; it will contain whatever was there beforehand (which naturally depends entirely on how you got the buffer in the first place).
As the other answer notes, static and global variables will be initialized to 0
, but local variables will not be initialized (and instead contain whatever was on the stack beforehand). This is in keeping with the zero-overhead principle: initializing local variables would, in some cases, be an unnecessary and unwanted run-time cost, while static and global variables are allocated at load-time as part of a data segment.
Initialization of heap storage is at the option of the memory manager, but in general it will not be initialized, either.
Upvotes: 2