Reputation: 19
I'm trying to create a C client for dalmatinerdb but having trouble to understand how to combine the variables, write it to a buffer and send it to the database. The fact that dalmatinerdb is written in Erlang makes it more difficult. However, by looking at a python client for dalmatinerdb i have (probably) found the necessary variable sizes and order.
The erlang client has a function called "encode", see below:
encode({stream, Bucket, Delay}) when
is_binary(Bucket), byte_size(Bucket) > 0,
is_integer(Delay), Delay > 0, Delay < 256->
<<?STREAM,
Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>;
According to the official dalmatinerdb protocol we can see the following:
-define(STREAM, 4).
-define(DELAY_SIZE, 8). /bits
-define(BUCKET_SS, 8). /bits
Let's say i would like to create this kind of structure in C, would it look something like the following:
struct package {
unsigned char[1] mode; // = "4"
unsigned char[1] delay; // = for example "5"
unsigned char[1] bucketNameSize; // = "5"
unsigned char[1] bucketName; // for example "Test1"
};
Update:
I realized that the dalmatinerdb frontend (web interface) only reacts and updates when values have been sent to the bucket. With other words just sending the first struct won't give me any clue if it's right or wrong. Therefore I will try to create a secondary struct with the actual values.
The erland code snippet which encodes values looks like this:
encode({stream, Metric, Time, Points}) when
is_binary(Metric), byte_size(Metric) > 0,
is_binary(Points), byte_size(Points) rem ?DATA_SIZE == 0,
is_integer(Time), Time >= 0->
<<?SENTRY,
Time:?TIME_SIZE/?SIZE_TYPE,
(byte_size(Metric)):?METRIC_SS/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?DATA_SS/?SIZE_TYPE, Points/binary>>;
The different sizes:
-define(SENTRY, 5)
-define(TIME_SIZE, 64)
-define(METRIC_SS, 16)
-define(DATA_SS, 32)
Which gives me this gives me:
<<?5,
Time:?64/?SIZE_TYPE,
(byte_size(Metric)):?16/?SIZE_TYPE, Metric/binary,
(byte_size(Points)):?32/?SIZE_TYPE, Points/binary>>;
My guess is that my struct containing a value should look like this:
struct Package {
unsigned char sentry;
uint64_t time;
unsigned char metricSize;
uint16_t metric;
unsigned char pointSize;
uint32_t point;
};
Any comments on this structure?
Upvotes: 0
Views: 227
Reputation: 20024
The binary created by the encode
function has this form:
<<?STREAM, Delay:?DELAY_SIZE/?SIZE_TYPE,
(byte_size(Bucket)):?BUCKET_SS/?SIZE_TYPE, Bucket/binary>>
First let's replace all the preprocessor macros with their actual values:
<<4, Delay:8/unsigned-integer,
(byte_size(Bucket):8/unsigned-integer, Bucket/binary>>
Now we can more easily see that this binary contains:
Delay
as a byteBucket
binary as a byteBucket
binaryBecause of the Bucket
binary at the end, the overall binary is variable-sized.
A C99 struct that resembles this value can be defined as follows:
struct EncodedStream {
unsigned char mode;
unsigned char delay;
unsigned char bucket_size;
unsigned char bucket[];
};
This approach uses a C99 flexible array member for the bucket
field, since its actual size depends on the value set in the bucket_size
field, and you are presumably using this structure by allocating memory large enough to hold the fixed-size fields together with the variable-sized bucket
field, where bucket
itself is allocated to hold bucket_size
bytes. You could also replace all uses of unsigned char
with uint8_t
if you #include <stdint.h>
. In traditional C, bucket
would be defined as a 0- or 1-sized array.
Update: the OP extended the question with another struct, so I've extended my answer below to cover it too.
The obvious-but-wrong way to write a struct
corresponding to the metric/time/points binary is:
struct Wrong {
unsigned char sentry;
uint64_t time;
uint16_t metric_size;
unsigned char metric[];
uint32_t points_size;
unsigned char points[];
};
There are two problems with the Wrong
struct:
Padding and alignment: Normally, fields are aligned on natural boundaries corresponding to their sizes. Here, the C compiler will align the time
field on an 8-byte boundary, which means there will be padding of 7 bytes following the sentry
field. But the Erlang binary contains no such padding.
Illegal flexible array field in the middle: The metric
field size can vary, but we can't use the flexible array approach for it as we did in the earlier example because such arrays can only be used for the final field of a struct. The fact that the size of metric
can vary means that it's impossible to write a single C struct that matches the Erlang binary.
Solving the padding and alignment issue requires using a packed struct, which you can achieve with compiler support such as the gcc and clang __packed__
attribute (other compilers might have other ways of achieving this). The variable-sized metric
field in the middle of the struct can be solved by using two structs instead:
typedef struct __attribute((__packed__)) {
unsigned char sentry;
uint64_t time;
uint16_t size;
unsigned char metric[];
} Metric;
typedef struct __attribute((__packed__)) {
uint32_t size;
unsigned char points[];
} Points;
Packing both structs means their layouts will match the layouts of the corresponding data in the Erlang binary.
There's still a remaining problem, though: endianness. By default, fields in an Erlang binary are big-endian. If you happen to be running your C code on a big-endian machine, then things will just work, but if not — and it's likely you're not — the data values your C code reads and writes won't match Erlang.
Fortunately, endianness is easily handled: you can use byte swapping to write C code that can portably read and write big-endian data regardless of the endianness of the host.
To use the two structs together, you'd first have to allocate enough memory to hold both structs and both the metric
and the points
variable-length fields. Cast the pointer to the allocated memory — let's call it p
— to a Metric*
, then use the Metric
pointer to store appropriate values in the struct fields. Just make sure you convert the time
and size
values to big-endian as you store them. You can then calculate a pointer to where the Points
struct is in the allocated memory as shown below, assuming p
is a pointer to char
or unsigned char
:
Points* points = (Points*)(p + sizeof(Metric) + <length of Metric.metric>);
Note that you can't just use the size
field of your Metric
instance for the final addend here since you stored its value as big-endian. Then, once you fill in the fields of the Points
struct, again being sure to store the size
value as big-endian, you can send p
over to Erlang, where it should match what the Erlang system expects.
Upvotes: 2