tim
tim

Reputation: 43

transferring binary files between systems

I'm trying to transfer my files between 2 UNIX clusters, the data is pure numeric (vectors of double) in binary form. Unfortunately one of the systems is IBM ppc997 and the other is AMD Opteron, It seems the format of binary numbers in these systems are different.

I have tried 3 ways till now:

1- Changed my files to ASCII format (i.e. saved a number at each line in a text file), sent them to the destination and changed them again to binary on the target system (they both are UNIX, no end of line character difference??!)

2- Sent pure binaries to the destination

3- used uuencode sent them to the destination and decoded them

Unfortunately any of these methods does not work (my code in the destination system generates garbage, while it works on the first system, I'm 100% sure the code itself is portable). I don't know what else I can do? Do you have any idea? I'm not a professional, please don't use computer scientists terminology!

And: my codes are in C, so by binary I mean a one to one mapping between memory and hard disk.

Thanks

Upvotes: 0

Views: 1187

Answers (5)

Dummy00001
Dummy00001

Reputation: 17420

Provided details are scarce. Answering to best of my understanding.

.. one of the systems is IBM ppc997 and the other is AMD Opteron

Former system generally (*) uses big-endian presentation, later - little-endian. Read this.

(*) It depends on the OS. IBM's POWER CPU can do both little and big endian, but no OS actually running on them uses little-endian mode.

Normally, for binary presentation one picks one endianness and goes with it for binary presentation. For network stuff big-endian number presentation is a norm.

That means all places which do something like this:

/* writing to binary */
int a = 1234;
write(fd,&a,sizeof(a));
/* reading from binary */
int x;
read(fd,&x,sizeof(x));

should be converted to something like this:

/* writing to binary */
int a = htonl(1234);
write(fd,&a,sizeof(a));
/* reading from binary */
int x;
read(fd,&x,sizeof(x));
x = ntohl(x);

Another approach is to save endianness indicator (e.g. write magic and check it on other side: MAGIC = 0x12345678 v. MAGIC = 0x78563412) along with the binary data, and apply conversion only when endianness differs. Though that approach is less elegant and has no real of advantages I'm aware of.

Upvotes: 2

Juliano
Juliano

Reputation: 41397

The code is not 100% portable if you are writing memory contents to files.

You need something called serialization. Ok, computer science term, but it basically means that you get your data and transform it into a well-defined and documented sequence of bytes, which can be read back to memory later by the same or another program. This sequence of bytes is architecture and platform-independent.

Most Unix environments already come with a XDR implementation, which provides routines for data serialization.

A simple example encoding 4 doubles to stdout (you can use shell redirection, or use fopen() to open a file instead of stdout):

XDR xdrs;
double data[4] = { 1.0, 255.41, -357.1, 123.4 };
int i;

xdrstdio_create(&xdrs, stdout, XDR_ENCODE);
for (i = 0; i < 4; i++)
    xdr_double(&xdrs, &data[i]);

Now, to get these doubles back (from stdin) and print them:

XDR xdrs;
double data;
int i;

xdrstdio_create(&xdrs, stdin, XDR_DECODE);
for (i = 0; i < 4; i++) {
    xdr_double(&xdrs, &data);
    printf("%g\n", data);
}

You can encode and decode complex structures using XDR. This was a very dumb way of sending four doubles to a file, and generally you should instead use xdr_array() to read/write arrays of some data type. The same commands, in the same order, have to be executed when saving and when loading the file. In fact, you can use rpcgen to generate C structs and their corresponding xdr functions automatically.

Upvotes: 3

MSN
MSN

Reputation: 54604

All processors that support IEEE 754 have the same binary representation for floats (technically called singles) and doubles. The only difference will be in the endianness of the processor.

So the only incompatibility between the IBM PPC and the AMD Opteron should be the endianness of the doubles.

When you byteswap the doubles from disk to memory, DON'T DO THIS:

double swap(double a); // THIS IS NEVER THE RIGHT THING TO DO.

Passing in the double by value may pass it in through floating point registers. Because not all bit combinations are valid doubles, the processor may silently convert the double to a NaN, which may have a different bit representation than the value passed in. This is more likely to happen with a valid double that is in the opposite endian order. (See here for a more detailed explanation.)

In other words, pass the double you want to byteswap as a pointer or an array of chars. (Array of chars should be the best bet.)

Upvotes: 0

Jens Gustedt
Jens Gustedt

Reputation: 78923

Solutions 2 and 3 will generally not work, since different processors might use different internal representations of your numbers. For integers, not float/doubles, you could get away with something that just takes care of the byte order of your different machines. Floating point representations are much more tricky, and you would have to look up in detail what representations your different architectures use. But still then for double, e.g, there is only a minimal requirement about the precision, and you might find yourself in a situation where you'd have to truncate to the smaller representation of the two. These problems have not much to do with the OS you are using (Unix or not) but with how the hardware likes to have things.

Upvotes: 0

Dirk is no longer here
Dirk is no longer here

Reputation: 368251

Method 1 should work. Just create a test vector with values 1, 2, ..., 10 and send it across. You cam read the ascii that was created (so you can validate 'export') and therefore also check the 'import' step of re-reading the file. You may loose precision this way, but it should get you operational.

Method 2 will work once you use a library such as XDR that deals with the different endianness. These things used to be bigger problem 'way back when' and there are solutions. This is e.g. how system like R permit you so share binary files between architectures.

Method 3 is not needed unless you do something really awkward when transferring the file.

Upvotes: 2

Related Questions