Reputation: 12374
I have a little VM for a programming language implemented in C. It supports being compiled under both 32-bit and 64-bit architectures as well as both C and C++.
I'm trying to make it compile cleanly with as many warnings enabled as possible. When I turn on CLANG_WARN_IMPLICIT_SIGN_CONVERSION
, I get a cascade of new warnings.
I'd like to have a good strategy for when to use int
versus either explicitly unsigned types, and/or explicitly sized ones. So far, I'm having trouble deciding what that strategy should be.
It's certainly true that mixing them—using mostly int
for things like local variables and parameters and using narrower types for fields in structs—causes lots of implicit conversion problems.
I do like using more specifically sized types for struct fields because I like the idea of explicitly controlling memory usage for objects in the heap. Also, for hash tables, I rely on unsigned overflow when hashing, so it's nice if the hash table's size is stored as uint32_t
.
But, if I try to use more specific types everywhere, I find myself in a maze of twisty casts everywhere.
What do other C projects do?
Upvotes: 58
Views: 10265
Reputation: 1
Most of the time, using int
is not ideal. The main reason is that int
is signed and signed can cause UB, signed integers can also be negative, something that you don't need for most integers. Prefer unsigned integers. Secondly, data types reflect meaning and a, very limited, way to document the used range and values this variable may have. If you use int
, you imply that you expect this variable to sometimes hold negative values, that this values probably do not always fit into 8 bit but always fit into INT_MAX
, which can be as low as 32767
. Do not assume a int
is 32 bit.
Always, think about the possible values of a variable and choose the type accordingly. I use the following rules:
size_t
except when there are good reasons not to. Almost never use int
for it, a int
can be too small and there is a high chance of creating a UB bug that isn't found during testing because you never tested arrays large enough.size_t
.ptrdiff_t
. But be aware, ptrdiff_t
can be too small, but that is rare.uint_fastN_t
, uintN_t
, or uint_leastN_t
types. This can make a lot of sense especially on a 8 bit microcontroller.unsigned int
can be used instead of uint_fast16_t
, similarly int
for int_fast16_t
.int
. int
can store -1
if you need an indicator for error or not set and a character literal is of type int
. (This is true for C, for C++ you may use a different strategy). There is the extremely rare possibility that a machine uses sizeof(int)==1 && CHAR_MIN==0
where a byte can not be handled with a int
, but i never saw such a machine.After a certain size, a project needs a list/enum
of the native integer data types. You can use macros with the _Generic
expression from C11, that only needs to handle bool
, signed char
, short
, int
, long
, long long
and their unsigned
counterparts to get the underlying native type from a typedefed one. This way your parsers and similar parts only need to handle 11 integer types and not 56 standard integer (if i counted correctly), and a bunch of other non-standard types.
Upvotes: 0
Reputation: 9180
You seem to know what you are doing, judging from the linked source code, which I took a glance at.
You said it yourself - using "specific" types makes you have more casts. That's not an optimal route to take anyway. Use int
as much as you can, for things that do not mandate a more specialized type.
The beauty of int
is that it is abstracted over the types you speak of. It is optimal in all cases where you need not expose the construct to a system unaware of int
. It is your own tool for abstracting the platform for your program(s). It may also yield you speed, size and alignment advantage, depending.
In all other cases, e.g. where you want to deliberately stay close to machine specifications, int
can and sometimes should be abandoned. Typical cases include network protocols where the data goes on the wire, and interoperability facilities - bridges of sorts between C and other languages, kernel assembly routines accessing C structures. But don't forget that sometimes you would want to in fact use int
even in these cases, as it follows platforms own "native" or preferred word size, and you might want to rely on that very property.
With platform types like uint32_t
, a kernel might want to use these (although it may not have to) in its data structures if these are accessed from both C and assembler, as the latter doesn't typically know what int
is supposed to be.
To sum up, use int
as much as possible and resort to moving from more abstract types to "machine" types (bytes/octets, words, etc) in any situation which may require so.
As to size_t
and other "usage-suggestive" types - as long as syntax follows semantics inherent to the type - say, using size_t
for well, size values of all kinds - I would not contest. But I would not liberally apply it to anything just because it is guaranteed to be the largest type (regardless if it is actually true). That's an underwater stone you don't want to be stepping on later. Code has to be self-explanatory to the degree possible, I would say - having a size_t
where none is naturally expected, would raise eyebrows, for a good reason. Use size_t
for sizes. Use offset_t
for offsets. Use [u]intN_t
for octets, words, and such things. And so on.
This is about applying semantics inherent in a particular C type, to your source code, and about the implications on the running program.
Also, as others have illustrated, don't shy away from typedef
, as it gives you the power to efficiently define your own types, an abstraction facility I personally value. A good program source code may not even expose a single int
, nevertheless relying on int
aliased behind a multitude of purpose-defined types. I am not going to cover typedef
here, the other answers hopefully will.
Upvotes: 15
Reputation: 89
Keep large numbers that are used to access members of arrays, or control buffers as size_t
.
For an example of a project that makes use of size_t
, refer to GNU's dd.c, line 155.
Upvotes: 8
Reputation: 3413
Here are a few things I do. Not sure they're for everyone but they work for me.
int
or unsigned int
directly. There always seems to be a more appropriately named type for the job.uint32_t
).uint_fast16_t
), selecting the type based on the minimum size required to access all array elements. For example, if I have a for
loop that will iterate through 24 elements max, I'll use uint_fast8_t
and let the compiler (or stdint.h, depending how pedantic we want to get) decide which is the fastest type for that operation.If you disagree with any of those or have recommended alternatives please let me know in the comments! That's the life of a software developer... we keep learning or we become irrelevant.
Upvotes: 2
Reputation: 1189
Always.
Unless you have specific reasons for using a more specific type, including you're on a 16-bit platform and need integers greater than 32767, or you need to ensure proper byte order and signage for data exchange over a network or in a file (and unless you're resource constrained, consider transferring data in "plain text," meaning ASCII or UTF8 if you prefer).
My experience has shown that "just use 'int'" is a good maxim to live by and makes it possible to turn out working, easily maintained, correct code quickly every time. But your specific situation may differ, so take this advice with a bit of well-deserved scrutiny.
Upvotes: 0
Reputation: 50368
Just using int
everywhere may seem tempting, since it minimizes the need for casting, but there are several potential pitfalls you should be aware of:
An int
might be shorter than you expect. Even though, on most desktop platforms, an int
is typically 32 bits, the C standard only guarantees a minimum length of 16 bits. Could your code ever need numbers larger than 216−1 = 32,767, even for temporary values? If so, don't use an int
. (You may want to use a long
instead; a long
is guaranteed to be at least 32 bits.)
Even a long
might not always be long enough. In particular, there is no guarantee that the length of an array (or of a string, which is a char
array) fits in a long
. Use size_t
(or ptrdiff_t
, if you need a signed difference) for those.
In particular, a size_t
is defined to be large enough to hold any valid array index, whereas an int
or even a long
might not be. Thus, for example, when iterating over an array, your loop counter (and its initial / final values) should generally be a size_t
, at least unless you know for sure that the array is short enough for a smaller type to work. (But be careful when iterating backwards: size_t
is unsigned, so for(size_t i = n-1; i >= 0; i--)
is an infinite loop! Using i != SIZE_MAX
or i != (size_t) -1
should work, though; or use a do
/while
loop, but beware of the case n == 0
!)
An int
is signed. In particular, this means that int
overflow is undefined behavior. If there's ever any risk that your values might legitimately overflow, don't use an int
; use an unsigned int
(or an unsigned long
, or uintNN_t
) instead.
Sometimes, you just need a fixed bit length. If you're interfacing with an ABI, or reading / writing a file format, that requires integers of a specific length, then that's the length you need to use. (Of course, is such situations, you may also need to worry about things like endianness, and so may sometimes have to resort to manually packing data byte-by-byte anyway.)
All that said, there are also reasons to avoid using the fixed-length types all the time: not only is int32_t
awkward to type all the time, but forcing the compiler to always use 32-bit integers is not always optimal, particularly on platforms where the native int
size might be, say, 64 bits. You could use, say, C99 int_fast32_t
, but that's even more awkward to type.
Thus, here are my personal suggestions for maximum safety and portability:
Define your own integer types for casual use in a common header file, something like this:
#include <limits.h>
typedef int i16;
typedef unsigned int u16;
#if UINT_MAX >= 4294967295U
typedef int i32;
typedef unsigned int u32;
#else
typedef long i32;
typedef unsigned long i32;
#endif
Use these types for anything where the exact size of the type doesn't matter, as long as they're big enough. The type names I've suggested are both short and self-documenting, so they should be easy to use in casts where needed, and minimize the risk of errors due to using a too-narrow type.
Conveniently, the u32
and u16
types defined as above are guaranteed to be at least as wide as unsigned int
, and thus can be used safely without having to worry about them being promoted to int
and causing undefined overflow behavior.
Use size_t
for all array sizes and indexing, but be careful when casting between it and any other integer types. Optionally, if you don't like to type so many underscores, typedef
a more convenient alias for it too.
For calculations that assume overflow at a specific number of bits, either use uintNN_t
, or just use u16
/ u32
as defined above and explicit bitmasking with &
. If you choose to use uintNN_t
, make sure to protect yourself against unexpected promotion to int
; one way to do that is with a macro like:
#define u(x) (0U + (x))
which should let you safely write e.g.:
uint32_t a = foo(), b = bar();
uint32_t c = u(a) * u(b); /* this is always unsigned multiply */
For external ABIs that require a specific integer length, again define a specific type, e.g.:
typedef int32_t fooint32; /* foo ABI needs 32-bit ints */
Again, this type name is self-documenting, with regard to both its size and its purpose.
If the ABI might actually require, say, 16- or 64-bit ints instead, depending on the platform and/or compile-time options, you can change the type definition to match (and rename the type to just fooint
) — but then you really do need to be careful whenever you cast anything to or from that type, because it might overflow unexpectedly.
If your code has its own structures or file formats that require specific bitlengths, consider defining custom types for those too, exactly as if it was an external ABI. Or you could just use uintNN_t
instead, but you'll lose a little bit of self-documentation that way.
For all these types, don't forget to also define the corresponding _MIN
and _MAX
constants for easy bounds checking. This might sound like a lot of work, but it's really just a couple of lines in a single header file.
Finally, remember to be careful with integer math, especially overflows.
For example, keep in mind that the difference of two n-bit signed integers may not fit in an n-bit int. (It will fit into an n-bit unsigned int, if you know it's non-negative; but remember that you need to cast the inputs to an unsigned type before taking their difference to avoid undefined behavior!)
Similarly, to find the average of two integers (e.g. for a binary search), don't use avg = (lo + hi) / 2
, but rather e.g. avg = lo + (hi + 0U - lo) / 2
; the former will break if the sum overflows.
Upvotes: 34