Reputation: 3869
I'm implementing a toy project to learn C and I have a seemingly simple question about unsigned type conversion rules.
In particular, I would like to know if the C standard expects unsigned types converted to smaller unsigned types to simply lose their most significant bits without using any bitmask.
Example: 0xABC (16 bit) -> 0xBC (8 bit)
Example code (Shared link):
#include <stdint.h>
#include <stdio.h>
void print_small_hex_value(uint8_t value) {
printf("Small hex value from function: %llx\n", value);
}
int main()
{
uint64_t large_value = 0xABCDEFABCDEFABCD;
printf("Large hex value: %llx\n", large_value);
uint8_t small_value = large_value; /* without bit mask */
printf("Small hex value: %llx\n", small_value);
uint8_t small_value_masked = large_value & 0xFF; /* with bit mask */
printf("Small hex value masked: %llx\n", small_value);
printf("\n");
print_small_hex_value(large_value); /* print from function */
print_small_hex_value(large_value & 0xFF);
print_small_hex_value(small_value);
}
Output:
Large hex value: abcdefabcdefabcd
Small hex value: cd
Small hex value masked: cd
Small hex value from function: cd
Small hex value from function: cd
Small hex value from function: cd
It seems to me that the “magical” conversion works even without bit masks.
So, why many codebases (i.e., CPython) force the bits through bit masking (a.k.a. value & 0xFF
)?
Is simply later elided by compilers if not necessary? Is it just me not noticing that in these cases, you are really dealing with signed integers?
What's the difference if the larger value (i.e., uint64_t
) is passed as an uint8_t
parameter or stored in an uint8_t
variable? Are the two cases treated differently by compilers?
Can someone point to a trusted source on this matter (like C standard)?
Upvotes: 4
Views: 100
Reputation: 214770
Regarding the trusted sources/C standard part:
How do we know that a conversion takes place?
You didn't force an explicit conversion with a cast in your example, you just wrote uint8_t small_value = large_value;
. So an implicit conversion happened here - how can we tell that this would happen? This is the assignment operator, so we got to dig up the rules for that one. C17 6.5.16.1:
The type of an assignment expression is the type the left operand would have after lvalue conversion.
/--/
In simple assignment (=
), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
Okay that's not very helpful. We can tell that a conversion will take place, and that the conversion will happen "to the type the left operand would have after lvalue conversion". Okaaay, so we dig up lvalue conversions next, C17 6.3.2.1:
An lvalue is an expression (with an object type other than void) that potentially designates an object
/--/
"Except when..." (long list of exceptions here)
"...an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue."
Ok this is just gibberish for anyone but C "language lawyers". What this means in plain English is that during assignment the right operand is converted to the type of the left operand and in case the right operand happened to be something like const int
(a "qualified type"), then const
is discarded (from the right operand of =
) before the conversion.
So in this case, uint64_t
is implicitly converted to uint8_t
, as guaranteed by the rules of the assignment operator.
What are the rules for the actual conversion?
Integer conversions from any integer type to an unsigned type are always well-defined:
C17 6.3.1.3
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60)
- The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
So this rule works "as if modulus". If we have some integer with value 0xABBA and convert it to uint8_t
, then:
uint8_t
is 256
.256
from 0xABBA, the first number we get that is in the 0-255 value range of the new uint8_t
type is 186, 0xBA.Upvotes: 3
Reputation: 141698
C standard expects unsigned types converted to smaller unsigned types to simply lose their most significant bits without using any bitmask.
Yes.
The line:
%llx\n", small_value
and similar others are invalid. See https://godbolt.org/z/b7xa794x1 . %llx
expects unsigned long long
argument. small_value
has type uint8_t
. You should use PRIx8
to from inttypes.h
print it.
Is simply later elided by compilers if not necessary?
Generally, yes.
Is it just me not noticing that in these cases you are really dealing with signed integers?
No.
What's the difference if the larger value (i.e. uint64_t) is passed as a uint8_t parameter or stored in a uint8_t variable?
No difference.
Are the two cases treated differently by compilers?
Except the obvious, no.
Can someone points to a trusted source on this matter (like C standard)?
When a value is assigned to a variable of particular type, that value is converted to the destination type. While you may read https://port70.net/~nsz/c/c11/n1570.html#6.3.1.3p2 :
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type
The 0xABCDEFABCDEFABCD
is 12379814471884843981. We repeatedly subtract 256
from this number 48358650280800171 times. After that operation, we are left with 205, which is 0xCD in hex. This is basically a fancy way of describing & 0xff
.
Nowadays, we have more ingestible cppreference https://en.cppreference.com/w/c/language/conversion .
why many codebases (i.e. CPython) force the bits through bit masking (a.k.a. value & 0xFF)?
It may be preference of the programmer, for readability or maintainability. There are also security standards in C, like MISRA 2012 rule 10.3 requires you to write uint8_t small_value = (uint8_t)large_value;
, but I do not think I know a rule that would require masking.
Upvotes: 5