cody
cody

Reputation: 681

how to serialize a struct in c?

I have a struct object that comprises of several primitive data types, pointers and struct pointers. I want to send it over a socket so that it can be used at the other end. As I want to pay the serialization cost upfront, how do I initialize an object of that struct so that it can be sent immediately without marshalling? For example

struct A {
    int i;  
    struct B *p;
};

struct B {
    long l;
    char *s[0];
};

struct A *obj; 

// can do I initialize obj?
int len = sizeof(struct A) + sizeof(struct B) + sizeof(?);
obj = (struct A *) malloc(len);
...

write(socket, obj, len);

// on the receiver end, I want to do this
char buf[len];

read(socket, buf, len);
struct A *obj = (struct A *)buf;
int i = obj->i;
char *s = obj->p->s[0];
int i obj.i=1; obj.p.

Thank you.

Upvotes: 11

Views: 35046

Answers (6)

Orankarl
Orankarl

Reputation: 1

I tried the method provided by @RageD but it didn't work.

The int value I got from deserialization was not the original one.

For me, memcpy() works for non-string variables. (You can still use strcpy() for char *)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct A {
    int a;
    char *str;
} test_struct_t;

char *serialize(test_struct_t t) {
    int str_len = strlen(t.str);

    int size = 2 * sizeof(int) + str_len;
    char *buf = malloc(sizeof(char) * (size+1));

    memcpy(buf, &t.a, sizeof(int));
    memcpy(buf + sizeof(int), &str_len, sizeof(int));
    memcpy(buf + sizeof(int) * 2, t.str, str_len);
    buf[size] = '\0';

    return buf;
}

test_struct_t deserialize(char *buf) {
    test_struct_t t;

    memcpy(&t.a, buf, sizeof(int));

    int str_len;
    memcpy(&str_len, buf+sizeof(int), sizeof(int));

    t.str = malloc(sizeof(char) * (str_len+1));
    memcpy(t.str, buf+2*sizeof(int), str_len);
    t.str[str_len] = '\0';

    return t;
}

int main() {
    char str[15] = "Hello, world!";

    test_struct_t t;
    t.a = 123;
    t.str = malloc(strlen(str) + 1);
    strcpy(t.str, str);
    printf("original values: %d %s\n", t.a, t.str);

    char *buf = serialize(t);
    test_struct_t new_t = deserialize(buf);
    printf("new values: %d %s\n", new_t.a, new_t.str);

    return 0;
}

And the output of the code above is:

original values: 123 Hello, world!
new values: 123 Hello, world!

Upvotes: 0

Bernardo Ramos
Bernardo Ramos

Reputation: 4627

You should serialize the data in a platform independent way.

Here is an example using the Binn library (my creation):

  binn *obj;

  // create a new object
  obj = binn_object();

  // add values to it
  binn_object_set_int32(obj, "id", 123);
  binn_object_set_str(obj, "name", "Samsung Galaxy Charger");
  binn_object_set_double(obj, "price", 12.50);
  binn_object_set_blob(obj, "picture", picptr, piclen);

  // send over the network
  send(sock, binn_ptr(obj), binn_size(obj));

  // release the buffer
  binn_free(obj);

If you don't want to use strings as keys you can use a binn_map which uses integers as keys. There is also support for lists. And you can insert a structure inside another (nested structures). eg:

  binn *list;

  // create a new list
  list = binn_list();

  // add values to it
  binn_list_add_int32(list, 123);
  binn_list_add_double(list, 2.50);

  // add the list to the object
  binn_object_set_list(obj, "items", list);

  // or add the object to the list
  binn_list_add_object(list, obj);

Upvotes: 1

RageD
RageD

Reputation: 6823

The simplest way to do this may be to allocate a chunk of memory to hold everything. For instance, consider a struct as follows:

typedef struct A {
  int v;
  char* str;
} our_struct_t;

Now, the simplest way to do this is to create a defined format and pack it into an array of bytes. I will try to show an example:

int sLen = 0;
int tLen = 0;
char* serialized = 0;
char* metadata = 0;
char* xval = 0;
char* xstr = 0;
our_struct_t x;
x.v   = 10;
x.str = "Our String";
sLen  = strlen(x.str); // Assuming null-terminated (which ours is)
tLen  = sizeof(int) + sLen; // Our struct has an int and a string - we want the whole string not a mem addr
serialized = malloc(sizeof(char) * (tLen + sizeof(int)); // We have an additional sizeof(int) for metadata - this will hold our string length
metadata = serialized;
xval = serialized + sizeof(int);
xstr = xval + sizeof(int);
*((int*)metadata) = sLen; // Pack our metadata
*((int*)xval) = x.v; // Our "v" value (1 int)
strncpy(xstr, x.str, sLen); // A full copy of our string

So this example copies the data into an array of size 2 * sizeof(int) + sLen which allows us a single integer of metadata (i.e. string length) and the extracted values from the struct. To deserialize, you could imagine something as follows:

char* serialized = // Assume we have this
char* metadata = serialized;
char* yval = metadata + sizeof(int);
char* ystr = yval + sizeof(int);
our_struct_t y;
int sLen = *((int*)metadata);
y.v = *((int*)yval);
y.str = malloc((sLen + 1) * sizeof(char)); // +1 to null-terminate
strncpy(y.str, ystr, sLen);
y.str[sLen] = '\0';

As you can see, our array of bytes is well-defined. Below I have detailed the structure:

  • Bytes 0-3 : Meta-data (string length)
  • Bytes 4-7 : X.v (value)
  • Bytes 8 - sLen : X.str (value)

This kind of well-defined structure allows you to recreate the struct on any environment if you follow the defined convention. To send this structure over the socket, now, depends on how you develop your protocol. You can first send an integer packet containing the total length of the packet which you just constructed, or you can expect that the metadata is sent first/separately (logically separately, this technically can still all be sent at the same time) and then you know how much data to receive on the client-side. For instance, if I receive metadata value of 10 then I can expect sizeof(int) + 10 bytes to follow to complete the struct. In general, this is probably 14 bytes.

EDIT

I will list some clarifications as requested in the comments.

I do a full copy of the string so it is in (logically) contiguous memory. That is, all the data in my serialized packet is actually full data - there are no pointers. This way, we can send a single buffer (we call is serialized) over the socket. If simply send the pointer, the user receiving the pointer would expect that pointer to be a valid memory address. However, it is unlikely that your memory addresses will be exactly the same. Even if they are, however, he will not have the same data at that address as you do (except in very limited and specialized circumstances).

Hopefully this point is made more clear by looking at the deserialization process (this is on the receiver's side). Notice how I allocate a struct to hold the information sent by the sender. If the sender did not send me the full string but instead only the memory address, I could not actually reconstruct the data which was sent (even on the same machine we have two distinct virtual memory spaces which are not the same). So in essence, a pointer is only a good mapping for the originator.

Finally, as far as "structs within structs" go, you will need to have several functions for each struct. That said, it is possible that you can reuse the functions. For instance, if I have two structs A and B where A contains B, I can have two serialize methods:

char* serializeB()
{
  // ... Do serialization
}

char* serializeA()
{
  char* B = serializeB();
  // ... Either add on to serialized version of B or do some other modifications to combine the structures
}

So you should be able to get away with a single serialization method for each struct.

Upvotes: 5

hdante
hdante

Reputation: 8030

Interpret your data and understand what you want to serialize. You want to serialize an integer and a structure of type B (recursivelly, you want to serialize an int, a long, and an array of strings). Then serialize them. The length you need it sizeof(int) + sizeof(long) + ∑strlen(s[i])+1.

On the other hand, serialization is a solved problem (multiple times actually). Are you sure you need to hand write a serialization routine ? Why don't you use D-Bus or a simple RPC call ? Please consider using them.

Upvotes: 0

Shahbaz
Shahbaz

Reputation: 47603

This answer is besides the problems with your malloc.

Unfortunately, you cannot find a nice trick that would still be compatible with the standard. The only way of properly serializing a structure is to separately dissect each element into bytes, write them to an unsigned char array, send them over the network and put the pieces back together on the other end. In short, you would need a lot of shifting and bitwise operations.

In certain cases you would need to define a kind of protocol. In your case for example, you need to be sure you always put the object p is pointing to right after struct A, so once recovered, you can set the pointer properly. Did everyone say enough already that you can't send pointers through network?

Another protocolish thing you may want to do is to write the size allocated for the flexible array member s in struct B. Whatever layout for your serialized data you choose, obviously both sides should respect.

It is important to note that you cannot rely on anything machine specific such as order of bytes, structure paddings or size of basic types. This means that you should serialize each field of the element separately and assign them fixed number of bytes.

Upvotes: 5

spartacus
spartacus

Reputation: 613

@Shahbaz is right I would think you actually want this

int len = sizeof(struct A);
obj = (struct A *) malloc(len);

But also you will run into problems when sending a pointer to another machine as the address the pointer points to means nothing on the other machine.

Upvotes: -1

Related Questions