Reputation: 1740
I'm trying to figure out how agument are actually recorded in compiled binary file of a c/c++ program. The following is my program. I'm just trying to make it as simple as possible
void f(char a,char b){}
int main(){f(12,23);}
In order to actually be able to "read" the binary file, I need to convert that to some ASCII "representable" form. I find out that
grep $'\xx' a.out
Actually works with a.out as binary file and xx as the decimal ascii code. But grep can't tell me anything since it will only output "binary match". And if I force it to print out with '-a', it will just print out everything. Though, I can use -c option to see how many of them are there:
grep $'\12' b.out (I renamed the file) ==> 4
grep $'\23' b.out ==> 3
But in order to study something, I need the exact position. So I programmed another program which basically prints out ASCII accroding to char.
#include<iostream>
using namespace std;
int main(){char c;
while(cin>>c)cout<<(int)c<<' ';}
But when I run the following command, the result actually don't match:
./a.out<./b.out|tr ' ' '\n'|grep -c '^12$' ==> 0
./a.out<./b.out|tr ' ' '\n'|grep -c '^23$' ==> 4
I'm wondering did I write anything wrong in my test program? Or does grep has some kind of special mecanism (like not byte-by-byte)? And which one is correct? Or can somebody directly provide me the answer to: HOW would "1,2,3,4" in func(1,2,3,4) be recorded in binary
EDT1 Thanks for the advise, I used "od -tu1" to replace my test program which works really good. And I enhanced my tested program a little bit so that the argument would be more obvious and the numbers won't "disappear":
void f(int a,int b,int c,int d,int e,int f,int g,int h,int i,int j,int k,int l,int m,int n,int o,int p,int q,int r,int s,int t){a+=b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t;}
int main(){f(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);}
By changing those arguments and using the "diff" command, I finally find out the position of these numbers in the binary:
0002560 68 36 104 19 0 0 0 199 68 36 96 18 0 0 0 199
0002600 68 36 88 17 0 0 0 199 68 36 80 16 0 0 0 199
0002620 68 36 72 15 0 0 0 199 68 36 64 14 0 0 0 199
0002640 68 36 56 13 0 0 0 199 68 36 48 12 0 0 0 199
0002660 68 36 40 11 0 0 0 199 68 36 32 10 0 0 0 199
0002700 68 36 24 9 0 0 0 199 68 36 16 8 0 0 0 199
0002720 68 36 8 7 0 0 0 199 4 36 6 0 0 0 65 185
0002740 5 0 0 0 65 184 4 0 0 0 185 3 0 0 0 186
0002760 2 0 0 0 190 1 0 0 0 191 0 0 0 0 232 234
As you can see, 19~9 are all clearly written here. But since 8, to 0, things are start changing in a not understandable way.. The displacement between digits are becoming smaller. And I also don't understand what's the number between them (I do understand that the 0 are the "int" part (little endian?)). Does the numbers represent some kind of address to "plug-in"? So they are different according to different position and their lengh are also different?
Upvotes: 1
Views: 186
Reputation: 15397
Wow. Your question shows that you're willing to experiment and eager to learn, but there's a lot more to understand than usually happens in a stack overflow question.
First, grep
is a very powerful tool, but not appropriate to your task. You'll be much more interested in od
which will give you the raw binary dump of a file. (Look up its flags to see how to output as hexidecimal, decimal, or even pure binary.)
Next, if you want to write a binary file, you're going to have a mess of stuff to look through if you write it in an executable. As well as the variables you're storing, the executable will have all of the code you're compiling. It will be very hard to isolate the (presumably) four bytes that represent your variables, and you'll want to read up a lot on the format behind an a.out executable to be able to do it.
It would be much cleaner simply to write a C program that will write a binary file, i.e. something like:
#include <stdio.h>
int main() {
int one;
int two;
int three;
int four;
one = 1;
two = 2;
three = 3;
four = 4;
FILE* fp = fopen("test.dat", "wb");
fwrite(&one, sizeof(int), 1, fp);
fwrite(&two, sizeof(int), 1, fp);
fwrite(&three, sizeof(int), 1, fp);
fwrite(&four, sizeof(int), 1, fp);
fclose(fp);
return 0;
}
There are tons of other ways to write the same code, and some good folks can correct any glaring mistakes I made (it's been a while since I've coded C without a compiler), but that should write only the 4 integers.
Finally, a quick answer to your question. Assuming an int is 32 bits, you'll be writing these numbers in binary. You'll have to look up "big-endian vs. little-endian" to understand the next part, but depending on your architecture, you'll be one or the other. Big-endian is more intuitive, so I'll answer using that concept.
Numbers are stored as 32 bit binary values. (The first bit in an int is the sign bit. If it's 1, the value is negative, and you'll have to look up "two's complement" to understand that notation.) In your case, for "1, 2, 3, 4", only the last 3 bits will matter, so you'll see a lot of 0s:
1: 00000000 0000000 00000000 00000001
2: 00000000 0000000 00000000 00000010
3: 00000000 0000000 00000000 00000011
4: 00000000 0000000 00000000 00000100
Note, this gets really clunky, so we tend to use hexadecimal. Using that, you can represent each 8-bit byte in 2 characters. In hex, your answer would be:
1: 00 00 00 01
2: 00 00 00 02
3: 00 00 00 03
4: 00 00 00 04
17: 00 00 00 11
255: 00 00 00 FF
You've got a lot of learning to do, but keep it up! I think it's wonderful how eager you are to experiment. Hope this helps.
Upvotes: 2