nakamurayuristeph
nakamurayuristeph

Reputation: 63

Questions on how Ctypes (and potentially the python implementation) handle data types?

The following is my source code and its output from the Win10 command line.

from ctypes import *
class eH():
    x= c_uint(0)
    print (type(x) == c_uint)
print (type(eH.x) == c_uint)
print (eH.x)
print (type(eH.x))
print (type(eH.x) == c_ulong)
print (c_uint == c_ulong)
print (c_int == c_long)
print ("\nEnd of eH prints\n#####")

class eHardware(Structure):
    _fields_= [("xyz", c_uint)]
a= eHardware()
a.xyz= eH.x
print (a.xyz)
print (a.xyz == eH.x)
print (type(a.xyz))
print (type(c_uint(a.xyz)))

The command line output is in the link: https://pastebin.com/umWUDEuy

First thing I noticed is that c_uint == c_ulong outputs True. Does that mean ctypes dynamically assign types on the fly and treat them as same in memory? Would this design has any implication if I want to port the similar script to a type-sensitive language, say C.

Second, in line 17 I assign a.xyz = eH.x, but in line 19 a.xyz == eH.x evaluates to False. Also, the type of a.xyz is been converted to int, while eH.x is of type c_uint (or c_ulong as it always evaluates to when type() is called) Thanks in advance for the responses.

Upvotes: 1

Views: 400

Answers (1)

abarnert
abarnert

Reputation: 365717

First thing I noticed is that c_uint == c_ulong outputs True.

This is explained right at the top of the docs:

Note: Some code samples reference the ctypes c_int type. On platforms where sizeof(long) == sizeof(int) it is an alias to c_long. So, you should not be confused if c_long is printed if you would expect c_int — they are actually the same type.

If you're wondering why it does this, it's to improve interaction with C code.

C is a weakly typed language—int and long are always distinct types, but you can always implicitly cast between them via the complicated integer promotion and narrowing rules.1 On many platforms, int and long happen to be both 32 bits, so these rules don't matter much, but on other platforms, long is 64 bits,2 so they do. Which makes it really easy to write code that works on your machine, but segfaults on someone else's by screwing up the stack (possibly even in a way that can be exploited by attackers).

ctypes attempts to reign this in by explicitly defining that c_int is an alias to c_long if and only if they're the same size. So:

  • If you're careful to always use c_int when the C function you're calling wants int and c_long when it wants long, your code will be portable, just like in C.
  • If you mix and match them arbitrarily, and that happens to be safe on your machine, it'll work on your machine, just like in C.
  • If you mix and match them arbitrarily, and then try to run them on a machine where that isn't safe, you should get an exception out of ctypes rather than a segfault.

Does that mean ctypes dynamically assign types on the fly and treat them as same in memory?

I suppose it depends on what you mean by "on the fly". If you look at the source, you'll see that when the module is compiled, it does this:

if _calcsize("i") == _calcsize("l"):
    # if int and long have the same size, make c_int an alias for c_long
    c_int = c_long
    c_uint = c_ulong
else:
    class c_int(_SimpleCData):
        _type_ = "i"
    _check_size(c_int)

    class c_uint(_SimpleCData):
        _type_ = "I"
    _check_size(c_uint)

Of course usually, when you import ctypes, you're getting a pre-compiled ctypes.pyc file,3 so the definition of c_int one way or the other is frozen into that pyc. So, in that sense, you don't have to worry about it being dynamic. But you can always delete the .pyc file, or tell Python not to use them at all. Or you can even monkeypatch ctypes.c_int to be something else, if you really want to. So, in that sense, it's definitely dynamic if you want it to be.4


Would this design has any implication if I want to port the similar script to a type-sensitive language, say C.

Well, the whole point of the design is to match C (and, in particular, the implementation-defined details of the C compiler used to build your CPython interpreter) as closely as possible while at the same time working around a few of the pitfalls of dealing with C. So, it's pretty rare that you design an interface with ctypes and then implement it with C; usually it's the other way around. But occasionally, it does happen (usually something to do with multiprocessing shared memory mapped to numpy arrays…).

In that case, just follow the same rules: make sure to keep c_int and c_long straight in your Python code, and match them to int and long in your C code, and things will work. You will definitely want to enable (and read) warnings in your C compiler to attempt to catch when you're mixing them up. And be prepared for occasional segfaults or memory corruption during debugging, but then you always need to be prepared for that in C.5


Also, the type of a.xyz is been converted to int, while eH.x is of type c_uint

The conversions to native types when you access struct members, pass arguments into C functions and return values out, etc. are pretty complicated. 95% of the time it just does what you want, and it's better to not worry about it.

The first time you hit the other 5% (usually it's because you have a c_char_p that you want to treat as a pointer rather than a string…), there's really no substitute for reading through the docs and learning about the default conversions and _as_parameter_ and _CData classes and restype vs. errcheck and so on. And doing a bit of experimentation in the interactive interpreter, and maybe reading the source.6


1. Most modern compilers will warn about narrowing conversions, and even let you optionally turn them into errors.

2. In the old days, when ctypes was first designed, it was more common for int to be 16 bits, but the effect is the same.

3. If you use a Windows or Mac Python installer, or an RPM or DEB binary package, or a Python that came preinstalled on your system, the stdlib was almost always compiled at the time of building the binary package, on someone else's machine. If you build from source, it's usually compiled at build or install time on your machine. If not, it usually gets compiled the first time you import ctypes.

4. Although I don't know why you'd want it to be. Easier to just define your own type with a different name…

5. You might want to consider using a language that's statically typed, and C-compatible, but has a much stricter and stronger type system than C, like Rust, or at least C++ or D. Then the compiler can do a lot more to help you make sure you're getting things right. But the tradeoffs here are really the same as they always are in choosing between C and another language; there's nothing all that ctypes-specific involved.

6. And finally throwing your hands in the air and declaring that from now on you're only ever going to use cffi instead of ctypes, which lasts until the first time you run into one of cffi's quirks…

Upvotes: 1

Related Questions