Reputation: 166046
I was helping a friend with some C++ homework. I warned said friend that the kind of programming I do (PHP, Perl, Python) is pretty different from C++, and there were no guarantees I wouldn't tell horrible lies.
I was able to answer his questions, but not without stumbling over my own dynamic background. While I was reacquainting myself with C++ array semantics, I did something stupid like this (simplified example to make my question clearer)
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c++ programmer) doesn't do that.
//but what is it doing?
char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
When I compile and run this program, it produces the following output
ABC????
However, if I comment out my bogus array declaration
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char easy_as_one_two_three[] = {'A','B','C'};
int an_int = 1;
//I want an array that has a length of the value
//that's currently in an_int (1)
//This clearly (to a c programmer) doesn't do that.
//but what is it doing?
//char breaking_things[an_int];
cout << easy_as_one_two_three << endl;
return 1;
}
I get the output I expect:
ABC
So, what exactly is happening here? I understand (vaguely) that when you create an array, you're pointing to a specific memory address, and when you give an array a length, you're telling the computer "reserve the next X blocks for me".
What I don't understand is, when I use a variable in an array declaration, what am I telling the computer to do, and why does it have an effect on a completely separate array?
Compiler is g++, version string is
science:c++ alanstorm$ g++ -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5493~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5493)
Upvotes: 3
Views: 539
Reputation: 320391
That will not "reacquaint" you "with c++ array semantics" since in C++ it is simply illegal. In C++ arrays can only be declared with sizes defined by Integral Constant Expressions (ICE). In your example the size is not an ICE. It only compiles because of GCC-specific extension.
From the C point of view, this is actually perfectly legal in C99 version of the language. And it does produce a so-called Variable Length Array of length 1. So your "clearly" comment is incorrect.
Upvotes: 6
Reputation: 59451
Update:
Neil pointed out in his comment to the question that you will get error if you compile this with -Wall
and -pedantic
flags in g++.
error: ISO C++ forbids variable-size array
You are getting ABC????
because it prints the contents of the array (ABC
) and continues to print until it encounters a \0
.
Had the array been {'A','B','C', '\0'};
, the output will be just ABC
as expected.
Variable-length arrays were introduced in C99 - this doesn't seem to apply to C++ though.
It is undefined behavior. Even if you comment out the bogus declaration, the printed output is not always what you expect (ABC). Try giving ASCII values of some printable character (something between 32 and 126) to an_int
instead of 1 and you will see the difference.
an_int output
------------------------
40 ABC(
65 ABCA
66 ABCB
67 ABCC
296 ABC(
552 ABC(
1064 ABC(
1024*1024 + 40 ABC(
See the pattern here? Apparently it interprets the last byte (LSB) of the an_int
as a char, prints it, somehow finds a null char afterwards and stops printing. I think the "somehow" has to do something with the MSB portion of an_int
being filled with zeros, but I'm not sure (and couldn't get any results to support this argument either).
UPDATE: It is about the MSB being filled zeros. I got the following results.
ABC(
for 40 - (3 zero bytes and a 40),
ABC((
for 10280 (which is (40 << 8) + 40) - (2 zero bytes and two 40s),
ABC(((
for 2631720 (which is (10280 << 8) + 40) - (1 zero byte and three 40s),
ABC((((°¿®
for 673720360 (which is (2631720 << 8) + 40) - no zero bytes and hence prints random chars until a zero byte is found.
ABCDCBA0á´¿á´¿®
for (((((65 << 8) + 66) << 8) + 67) << 8) + 68;
These results were obtained on a little endian processor with 8-bit atomic element size and 1-byte address increment, where 32 bit integer 40 (0x28 in hex) is represented as 0x28-0x00-0x00-0x00
(LSB at the lowest address). Results might vary from compiler to compiler and platform to platform.
Now if you try uncommenting the bogus declaration, you will find that all the outputs are of the form ABC-randomchars-char_corresponding_to_an_int
. This again is the result of undefined behavior.
Upvotes: 12
Reputation: 287
You get the output that you expect or don't expect by dumb luck. Because you didn't null terminate the characters in your array, when you go to print it out to cout it'll print the A, the B, and the C, and whatever else it finds until it hits a NULL character. With the array declaration, there's probably something that the compiler is pushing onto the stack to make the array sized at runtime that's leaving you with garbage characters after the A, B, and C whereas when you don't there just happens to be a 0 after the C on the stack.
Again, it's just dumb luck. To always get what you expect you should do: char easy_as_one_two_three[] = { 'A','B','C','\0'};
or, probably more usefully char easy_as_one_two_three[] = "ABC";
, which will properly null terminate the string.
Upvotes: 3
Reputation: 18340
It isn't invalid syntax. It's syntactically just fine.
It's semantically invalid C++, and rejected by my compiler (VC++). g++ seems to have an extension that allow the use of C99 VLAs in C++.
The reason for the question marks is that your array of three characters is not null terminated; it's printing until it finds a null on the stack. The layout of the stack is influenced by the variables declared on the stack. With the array, the layout is such that there's garbage prior to the first null; without the array there isn't. That is all.
Upvotes: 3
Reputation: 8529
Output is like this since it will print the content of the char array until it finds a null character .
Make sure that char array must be null terminated string and specify the size of the array --> total chars + 1 (for null char) .
Upvotes: 0
Reputation:
It's probably not breaking_things that broke things. The first array is not a NUL (\0) terminated string, which explains the output - cout will print whatever comes after ABC up until the first NUL it encounters.
As for the size of breaking_things, I would suspect it differs between compilers. I believe at least earlier versions of gcc used whatever value the variable happened to have at compile time, which can be tricky to determine.
Upvotes: 0
Reputation: 26873
char breaking_things[an_int] is allocating char array of size an_int (in your case 1), It's called variable length array and it's a relatively new feature.
In case like this it's more common to dynamically allocate memory using new:
char* breaking_things = new char[an_int]; // C++ way, C programmer would use malloc
Upvotes: 2