ZisIsNotZis
ZisIsNotZis

Reputation: 1740

Binary to ASCII different between C++ and Grep?

I'm trying to figure out how agument are actually recorded in compiled binary file of a c/c++ program. The following is my program. I'm just trying to make it as simple as possible

void f(char a,char b){}
int main(){f(12,23);}

In order to actually be able to "read" the binary file, I need to convert that to some ASCII "representable" form. I find out that

grep $'\xx' a.out

Actually works with a.out as binary file and xx as the decimal ascii code. But grep can't tell me anything since it will only output "binary match". And if I force it to print out with '-a', it will just print out everything. Though, I can use -c option to see how many of them are there:

grep $'\12' b.out (I renamed the file) ==> 4
grep $'\23' b.out                      ==> 3

But in order to study something, I need the exact position. So I programmed another program which basically prints out ASCII accroding to char.

#include<iostream>
using namespace std;
int main(){char c;
    while(cin>>c)cout<<(int)c<<' ';}

But when I run the following command, the result actually don't match:

./a.out<./b.out|tr ' ' '\n'|grep -c '^12$' ==> 0
./a.out<./b.out|tr ' ' '\n'|grep -c '^23$' ==> 4

I'm wondering did I write anything wrong in my test program? Or does grep has some kind of special mecanism (like not byte-by-byte)? And which one is correct? Or can somebody directly provide me the answer to: HOW would "1,2,3,4" in func(1,2,3,4) be recorded in binary


EDT1 Thanks for the advise, I used "od -tu1" to replace my test program which works really good. And I enhanced my tested program a little bit so that the argument would be more obvious and the numbers won't "disappear":

void f(int a,int b,int c,int d,int e,int f,int g,int h,int i,int j,int k,int l,int m,int n,int o,int p,int q,int r,int s,int t){a+=b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t;}
int main(){f(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19);}

By changing those arguments and using the "diff" command, I finally find out the position of these numbers in the binary:

0002560  68  36 104  19   0   0   0 199  68  36  96  18   0   0   0 199
0002600  68  36  88  17   0   0   0 199  68  36  80  16   0   0   0 199
0002620  68  36  72  15   0   0   0 199  68  36  64  14   0   0   0 199
0002640  68  36  56  13   0   0   0 199  68  36  48  12   0   0   0 199
0002660  68  36  40  11   0   0   0 199  68  36  32  10   0   0   0 199
0002700  68  36  24   9   0   0   0 199  68  36  16   8   0   0   0 199
0002720  68  36   8   7   0   0   0 199   4  36   6   0   0   0  65 185
0002740   5   0   0   0  65 184   4   0   0   0 185   3   0   0   0 186
0002760   2   0   0   0 190   1   0   0   0 191   0   0   0   0 232 234

As you can see, 19~9 are all clearly written here. But since 8, to 0, things are start changing in a not understandable way.. The displacement between digits are becoming smaller. And I also don't understand what's the number between them (I do understand that the 0 are the "int" part (little endian?)). Does the numbers represent some kind of address to "plug-in"? So they are different according to different position and their lengh are also different?

Upvotes: 1

Views: 186

Answers (1)

Scott Mermelstein
Scott Mermelstein

Reputation: 15397

Wow. Your question shows that you're willing to experiment and eager to learn, but there's a lot more to understand than usually happens in a stack overflow question.

First, grep is a very powerful tool, but not appropriate to your task. You'll be much more interested in od which will give you the raw binary dump of a file. (Look up its flags to see how to output as hexidecimal, decimal, or even pure binary.)

Next, if you want to write a binary file, you're going to have a mess of stuff to look through if you write it in an executable. As well as the variables you're storing, the executable will have all of the code you're compiling. It will be very hard to isolate the (presumably) four bytes that represent your variables, and you'll want to read up a lot on the format behind an a.out executable to be able to do it.

It would be much cleaner simply to write a C program that will write a binary file, i.e. something like:

#include <stdio.h>
int main() {
    int one;
    int two;
    int three;
    int four;
    one = 1;
    two = 2;
    three = 3;
    four = 4;
    FILE* fp = fopen("test.dat", "wb");
    fwrite(&one, sizeof(int), 1, fp); 
    fwrite(&two, sizeof(int), 1, fp); 
    fwrite(&three, sizeof(int), 1, fp); 
    fwrite(&four, sizeof(int), 1, fp); 
    fclose(fp);
    return 0;
}

There are tons of other ways to write the same code, and some good folks can correct any glaring mistakes I made (it's been a while since I've coded C without a compiler), but that should write only the 4 integers.

Finally, a quick answer to your question. Assuming an int is 32 bits, you'll be writing these numbers in binary. You'll have to look up "big-endian vs. little-endian" to understand the next part, but depending on your architecture, you'll be one or the other. Big-endian is more intuitive, so I'll answer using that concept.

Numbers are stored as 32 bit binary values. (The first bit in an int is the sign bit. If it's 1, the value is negative, and you'll have to look up "two's complement" to understand that notation.) In your case, for "1, 2, 3, 4", only the last 3 bits will matter, so you'll see a lot of 0s:

1: 00000000 0000000 00000000 00000001
2: 00000000 0000000 00000000 00000010
3: 00000000 0000000 00000000 00000011
4: 00000000 0000000 00000000 00000100

Note, this gets really clunky, so we tend to use hexadecimal. Using that, you can represent each 8-bit byte in 2 characters. In hex, your answer would be:

1:   00 00 00 01
2:   00 00 00 02
3:   00 00 00 03
4:   00 00 00 04
17:  00 00 00 11
255: 00 00 00 FF

You've got a lot of learning to do, but keep it up! I think it's wonderful how eager you are to experiment. Hope this helps.

Upvotes: 2

Related Questions