schefflera
schefflera

Reputation: 13

Encoding/Decoding (Berkeley database records) in python3

I have a pre-existing berkeley database written to and read from a program written in C++. I need to sidestep using this program and write to the database directly using python.

I can do this, but am having a heck of a time trying to encode my data properly such that it is in the proper form and can then be read by the original C++ program. In fact, I can't figure out how to decode the existing data when I know what the values are.

The keys of the key value pairs in the database should be timestamps in the form YYYYMMDDHHmmSS. The values should be five doubles and an int mashed together, by which I mean (from the source code of the C++ program), the following structure(?) DVALS

typedef struct
{
  double d1;
  double d2;
  double d3;
  double d4;
  double d5;
  int i1;
} DVALS;

is written to the database as the value of the key value pair like so:

DBT data;
memset(&data, 0, sizeof(DBT));

DVALS dval;
memset(&dval, 0, sizeof(DVALS));
data.data = &dval;
data.size = sizeof(DVALS);

db->put(db, NULL, &key, &data, 0);

Luckily, I can know what the values are. So if I run from the command line

db_dump myfile

the final record is:

323031393033313431353533303000
ae47e17a140e4040ae47e17a140e4040ae47e17a140e4040ae47e17a140e40400000000000b6a4400000000000000000

Using python's bsddb3 module I can pull this record out also:

from bsddb3 import db
myDB = db.DB()
myDB.open('myfile', None, db.DB_BTREE)
cur = myDB.cursor()
kvpair = cur.last()

With kvpair now holding the following information:

(b'20190314155300\x00', b'\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\x00\x00\x00\x00\x00\xb6\xa4@\x00\x00\x00\x00\x00\x00\x00\x00')

The timestamp is easy to read and in this case the actual values are as follows:

d1 = d2 = d3 = d4 = 32.11
d5 = 2651
i1 = 0

As the '\xaeG\xe1z\x14\x0e@@' sequence is repeated 4 times I think it corresponds to the value 32.11

So I think my question may just be about encoding/decoding, but perhaps there is more to it, hence the backstory.

kvpair[1].decode('utf-8')

Using a variety of encodings just gives errors similar to this:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 0: invalid start byte

Upvotes: 0

Views: 415

Answers (1)

snakecharmerb
snakecharmerb

Reputation: 55724

The value data is binary so it may be unpacked using Python's struct module.

>>> import struct
>>> bs = b'\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\x00\x00\x00\x00\x00\xb6\xa4@\x00\x00\x00\x00\x00\x00\x00\x00'
>>> len(bs)
48
>>> struct.unpack('<5di4x', bs)
(32.11, 32.11, 32.11, 32.11, 2651.0, 0)

struct.unpack takes two arguments: a format specifier that defines the data format and types and the data to be unpacked. The format '<5di4x' describes:

  • <: little endian order
  • 5d: five doubles (8 bytes each)
  • i: one signed int (4 bytes; I for unsigned)
  • 4x: four pad bytes

Data can be packed in the same way, using struct.pack.

>>> nums = [32.11, 32.11, 32.11, 32.11, 2651, 0]
>>> format_ = '5di4x'
>>> packed = struct.pack(format_, *nums)
>>> packed == bs
True
>>> 

Upvotes: 2

Related Questions