Rob
Rob

Reputation: 311

C struct => ctypes struct ... is this mapping correct?

I'm trying to access two legacy de/compression functions from Python that are written in C and are currently available via a DLL (I have the C source).

The functions are passed a (partially) populated C struct and use this information to either compress or decompress the data in the buffer provided.

This is how the functions are called. I added __cdecl for Python compatibility.

// Both functions return 0 on success and nonzero value on failure
int __cdecl pkimplode(struct pkstream *pStr);
int __cdecl pkexplode(struct pkstream *pStr);

Here's the pkstream struct as defined in C:

struct pkstream {
   unsigned char *pInBuffer;           // Pointer to input buffer
   unsigned int nInSize;               // Size of input buffer
   unsigned char *pOutBuffer;          // Pointer to output buffer
   unsigned int nOutSize;              // Size of output buffer upon return
   unsigned char nLitSize;             // Specifies fixed or var size literal bytes
   unsigned char nDictSizeByte;        // Dictionary size; either 1024, 2048, or 4096
   // The rest of the members of this struct are used internally,
   // so setting these values outside pkimplode or pkexplode has no effect
   unsigned char *pInPos;              // Current position in input buffer
   unsigned char *pOutPos;             // Current position in output buffer
   unsigned char nBits;                // Number of bits in bit buffer
   unsigned long nBitBuffer;           // Stores bits until enough to output a byte
   unsigned char *pDictPos;            // Position in dictionary
   unsigned int nDictSize;             // Maximum size of dictionary
   unsigned int nCurDictSize;          // Current size of dictionary
   unsigned char Dict[0x1000];         // Sliding dictionary used for compdecomp
};

This is my attempt at mirroring this struct in Python.

# Define the pkstream struct
class PKSTREAM(Structure):
   _fields_ = [('pInBuffer', c_ubyte),
               ('nInSize', c_uint),
               ('pOutBuffer', c_ubyte),
               ('nOutSize', c_uint),
               ('nLitSize', c_ubyte),
               ('nDictSizeByte', c_ubyte),
               ('pInPos', c_ubyte),
               ('pOutPos', c_ubyte),
               ('nBits', c_ubyte),
               ('nBitBuffer', c_ulong),
               ('pDictPos', c_ubyte),
               ('nDictSize', c_uint),
               ('nCurDictSize', c_uint),
               ('Dict', c_ubyte)]

I would really appreciate some help with the following questions (which I'm choosing to ask questions on the front-end rather than just 'winging' it, hopefully for obvious reasons):

  1. I'm not sure whether to use c_ubyte, c_char or c_char_p for the members of type unsigned char. c_ubyte most closely maps to the ctypes for unsigned char (according to the docs, at least), but is actually an ?int/long? in Python.

  2. Sometimes the member is a pointer to an unsigned char ... would this map to c_char_p? ctypes docs say ALL byte & unicode strings are passed as pointers anyway, so what provisions do I need to make for this?

  3. I need to provide pOutBuffer to the function, which should be a pointer to the location of allocated memory to which the function can copy the de/compressed data. I believe I should use create_string_buffer() to create an appropriately sized buffer for this?

  4. I also need to know how to define the member Dict[0x1000], which looks (to me) to create a 4096 byte buffer for internal use within the functions. I know my definition is clearly wrong, but don't know how should it be defined?

  5. Should the C functions be decorated as __stdcall or __cdecl? (I've been using the latter on some test DLLs as I've worked my way up to this point).

Any feedback would be VERY much appreciated!

Thanks in advance,

James

Upvotes: 3

Views: 2835

Answers (1)

jsbueno
jsbueno

Reputation: 110311

If the data in the structure is a pointer, you have to declare it as a pointer on the Python side as well.

One way to do that is to use the POINTER utility in ctypes - it is an object on a somewhat higher level than ctypes.c_char_p (and not fully compatible with that) - but your code will become more readable. Also, for simulating C arrays, the base ctypes types can be multiplied by a scalar, and the returned object is one that can be used as a C vector of the base type of the same size - (so the Dict field can be defined as bellow, c_ubyte * 4096)

Note that while char is equivalent to c_ubyte, int is equivalent to c_int instead of c_uint and likewise for long.

Your structure definition does not state that the pointed to buffers are const. If you pass a python string (immutable) and your library tries to alter it you will get errors. Instead you should pass mutable memory that is returned from create_string_buffer, initialised by your string.

POINTER = ctypes.POINTER
# Define the pkstream struct
class PKSTREAM(Structure):
   _fields_ = [('pInBuffer', POINTER(c_char)),
               ('nInSize', c_int),
               ('pOutBuffer', POINTER(c_char)),
               ('nOutSize', c_int),
               ('nLitSize', c_char),
               ('nDictSizeByte', c_char),
               ('pInPos', POINTER(c_char)),
               ('pOutPos', POINTER(c_char)),
               ('nBits', c_char),
               ('nBitBuffer', c_long),
               ('pDictPos', POINTER(c_char)),
               ('nDictSize', c_int),
               ('nCurDictSize', c_int),
               ('Dict', c_char * 0x1000)]

As for (5), I don't know how you should decorate your C functions - use whatever works.

Upvotes: 2

Related Questions