ChrisoLosoph
ChrisoLosoph

Reputation: 607

Reversing Bytes and cross compatible binary parsing in Nim

I've started taking a look at Nim for hobby game modding purposes.

Intro

Yet, I found it difficult to work with Nim compared to C when it comes to machine-specific low-level memory layout and would like to know if Nim actually has better support here.

I need to control byte order and be able to de/serialize arbitrary Plain-Old-Datatype objects to binary custom file formats. I didn't directly find a Nim library which allows flexible storage options like representing enum and pointers with Big-Endian 32-bit. Or maybe I just don't know how to use the feature.

Flexible cross-compatibility means, it must be able to de/serialize fields independently of Nim's ABI but with customization options.

Maybe "Kaitai Struct" is more what I look for, a file parser with experimental Nim support.

TL;DR

As a workaround for a serialization library I tried myself at a recursive "member fields reverser" that makes use of std/endians which is almost sufficient.

But I didn't succeed with implementing byte reversal of arbitrarily long objects in Nim. Not practically relevant but I still wonder if Nim has a solution.

I found reverse() and reversed() from std/algorithm but I need a byte array to reverse it and turn it back into the original object type. In C++ there would be reinterprete_cast, in C there is void*-cast, in D there is a void[] cast (D allows defining array slices from pointers) but I couldn't get it working with Nim.

I tried cast[ptr array[value.sizeof, byte]](unsafeAddr value)[] but I can't assign it to a new variable. Maybe there was a different problem.

How to "byte reverse" arbitrary long Plain-Old-Datatype objects?

How to serialize to binary files with byte order, member field size, pointer as file "offset - start offset"? Are there bitfield options in Nim?

Upvotes: 2

Views: 1024

Answers (2)

shirleyquirk
shirleyquirk

Reputation: 1598

It is indeed possible to use algorithm.reverse and the appropriate cast invocation to reverse bytes in-place:

import std/[algorithm,strutils,strformat]

type
  LittleEnd{.packed.} = object
    a: int8
    b: int16
    c: int32
  BigEnd{.packed.} = object
    c: int32
    b: int16
    a: int8

## just so we can see what's going on:
proc `$`(b: LittleEnd):string = &"(a:0x{b.a.toHex}, b:0x{b.b.toHex}, c:0x{b.c.toHex})"
proc `$`(l:BigEnd):string = &"(c:0x{l.c.toHex}, b:0x{l.b.toHex}, a:0x{l.a.toHex})"


var lit = LittleEnd(a: 0x12, b:0x3456, c: 0x789a_bcde)
echo lit # (a:0x12, b:0x3456, c:0x789ABCDE)

var big:BigEnd

copyMem(big.addr,lit.addr,sizeof(lit))

# here's the reinterpret_cast you were looking for:
cast[var array[sizeof(big),byte]](big.addr).reverse

echo big # (c:0xDEBC9A78, b:0x5634, a:0x12)

for C-style bitfields there is also the {.bitsize.} pragma but using it causes Nim to lose sizeof information, and of course bitfields wont be reversed within bytes

import std/[algorithm,strutils,strformat]

type
  LittleNib{.packed.} = object
    a{.bitsize: 4}: int8
    b{.bitsize: 12}: int16
    c{.bitsize: 20}: int32
    d{.bitsize: 28}: int32
  BigNib{.packed.} = object
    d{.bitsize: 28}: int32
    c{.bitsize: 20}: int32
    b{.bitsize: 12}: int16
    a{.bitsize: 4}: int8
const nibsize = 8

proc `$`(b: LittleNib):string = &"(a:0x{b.a.toHex(1)}, b:0x{b.b.toHex(3)}, c:0x{b.c.toHex(5)}, d:0x{b.d.toHex(7)})"
proc `$`(l:BigNib):string = &"(d:0x{l.d.toHex(7)}, c:0x{l.c.toHex(5)}, b:0x{l.b.toHex(3)}, a:0x{l.a.toHex(1)})"
var lit = LitNib(a: 0x1,b:0x234, c:0x56789, d: 0x0abcdef)
echo lit # (a:0x1, b:0x234, c:0x56789, d:0x0ABCDEF)


var big:BigNib

copyMem(big.addr,lit.addr,nibsize)
cast[var array[nibsize,byte]](big.addr).reverse
echo big # (d:0x5DEBC0A, c:0x8967F, b:0x123, a:0x4)

It's less than optimal to copy the bytes over, then rearrange them with reverse, anyway, so you might just want to copy the bytes over in a loop. Here's a proc that can swap the endianness of any object, (including ones for which sizeof is not known at compiletime):

template asBytes[T](x:var T):ptr UncheckedArray[byte] = 
  cast[ptr UncheckedArray[byte]](x.addr)

proc swapEndian[T,U](src:var T,dst:var U) =
  assert sizeof(src) == sizeof(dst)
  let len = sizeof(src)
  for i in 0..<len:
    dst.asBytes[len - i - 1] = src.asBytes[i]

Upvotes: 2

Grzegorz Adam Hankiewicz
Grzegorz Adam Hankiewicz

Reputation: 7681

Bit fields are supported in Nim as a set of enums:

type
  MyFlag* {.size: sizeof(cint).} = enum
    A
    B
    C
    D
  MyFlags = set[MyFlag]

proc toNum(f: MyFlags): int = cast[cint](f)
proc toFlags(v: int): MyFlags = cast[MyFlags](v)

assert toNum({}) == 0
assert toNum({A}) == 1
assert toNum({D}) == 8
assert toNum({A, C}) == 5
assert toFlags(0) == {}
assert toFlags(7) == {A, B, C}

For arbitrary bit operations you have the bitops module, and for endianness conversions you have the endians module. But you already know about the endians module, so it's not clear what problem you are trying to solve with the so called byte reversal. Usually you have an integer, so you first convert the integer to byte endian format, for instance, then save that. And when you read back, convert from byte endian format and you have the int. The endianness procs should be dealing with reversal or not of bytes, so why do you need to do one yourself? In any case, you can follow the source hyperlink of the documentation and see how the endian procs are implemented. This can give you an idea of how to cast values in case you need to do some yourself.

Since you know C maybe the last resort would be to write a few serialization functions and call them from Nim, or directly embed them using the emit pragma. However this looks like the least cross platform and pain free option.

Can't answer anything about generic data structure serialization libraries. I stray away from them because they tend to require hand holding imposing certain limitations on your code and depending on the feature set, a simple refactoring (changing field order in your POD) may destroy the binary compatibility of the generated output without you noticing it until runtime. So you end up spending additional time writing unit tests to verify that the black box you brought in to save you some time behaves as you want (and keeps doing so across refactorings and version upgrades!).

Upvotes: 1

Related Questions