Reputation: 85
I was solving a problem that involved incrementing a counter and displaying it. The way I initialized and incremented the variable seemed pretty normal. See counter variable
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char s[5];
int counter = 1;
while (cin >> s && (strcmp(s, "*") != 0))
{
cout << "Case " << counter++ << ": Hajj-e-A";
if (s[0] == 'H')
{
cout << "kbar\n";
}
else if (s[0] == 'U')
{
cout << "sghar\n";
}
}
}
but the program mysteriously displayed an incorrect result. It didn't increment the value, which started at 1, properly. See the output.
Case 1: Hajj-e-Akbar
Case 0: Hajj-e-Asghar
Case 1: Hajj-e-Akbar
Case 0: Hajj-e-Asghar
But when I tried compiling and running it through http://www.tutorialspoint.com/compile_cpp_online.php, which uses Linux, it produced correct results. The program was also accepted by the online judge.
Case 1: Hajj-e-Akbar
Case 2: Hajj-e-Asghar
Case 3: Hajj-e-Akbar
Case 4: Hajj-e-Asghar
Anyone can point out the mystery behind this? Why is the Windows-compiled code producing weird results? Many thanks!
Upvotes: 1
Views: 707
Reputation: 1679
This is a buffer overflow. Most likely, when you compile on Windows, the counter
variable immediately follows the s[5]
variable in memory, like this:
+----+----+----+----+----+----+----+----+----+
| ?? | ?? | ?? | ?? | ?? | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
\________ s[5] ________/ \____ counter ____/
Since Windows is little-endian, it's stored as 01 00 00 00
instead of 00 00 00 01
like you might expect. The ??
just indicates that we don't know yet what's there - it could be anything.
Now, let's say you enter "Hardy" and press Enter. In ASCII, that translates to the byte sequence 48 61 72 64 79 0D 0A
(the last two are the line ending, on UNIX the 0D
would be omitted). This is what cin >> s
will do to the memory:
1. Read in 'H':
+----+----+----+----+----+----+----+----+----+
| 48 | ?? | ?? | ?? | ?? | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
2. Read in 'a':
+----+----+----+----+----+----+----+----+----+
| 48 | 61 | ?? | ?? | ?? | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
3. Read in 'r':
+----+----+----+----+----+----+----+----+----+
| 48 | 61 | 72 | ?? | ?? | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
4. Read in 'd':
+----+----+----+----+----+----+----+----+----+
| 48 | 61 | 72 | 64 | ?? | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
5. Read in 'y':
+----+----+----+----+----+----+----+----+----+
| 48 | 61 | 72 | 64 | 79 | 01 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
6. Read in '\r\n' (or on UNIX, just '\n'), but this isn't put into the memory.
Instead, cin realizes that it has finished reading, and closes off the string with a '\0':
+----+----+----+----+----+----+----+----+----+
| 48 | 61 | 72 | 64 | 79 | 00 | 00 | 00 | 00 |
+----+----+----+----+----+----+----+----+----+
\________ s[5] ________/ \____ counter ____/
Whoops! It overwrote the counter!
Why does it work correctly on Linux? Either Linux doesn't place the two variables adjacent in memory, or the Linux system is big-endian, which would mean the memory is instead laid out like this:
+----+----+----+----+----+----+----+----+----+
| ?? | ?? | ?? | ?? | ?? | 00 | 00 | 00 | 01 |
+----+----+----+----+----+----+----+----+----+
So even if you read in 5 characters, the final null terminator just replaces a 0 that was already there. Of course, if this is the cause, then reading in 6 characters would really mess things up.
And how do you fix it? The problem is that, to hold a string of length n
, a character array needs to have a length of n+1
. So you could do this:
char s[6];
Or even better, use a string:
std::string s;
(For that you need to #include <string>
.)
Upvotes: 8