tansy
tansy

Reputation: 566

adding char as int gives unexpected result

I try to do some hex-bin/bin-hex conversion and test which function would be faster but I encountered strange error while adding 'a' as integer.

#include <stdio.h>

/* convert bin to hex char [0-9a-f] */
static inline unsigned char ToHex4bits1(unsigned char znak)
//unsigned char ToHex4bits1(unsigned char znak)
    {
    znak &= 0x0F;
    switch(znak)
        {
        case 10: return 'a';
        case 11: return 'b';
        case 12: return 'c';
        case 13: return 'd';
        case 14: return 'e';
        case 15: return 'f';
        default: return znak + 48;  /// 48  0x30    '0'
        }
    }

/* convert bin to hex char [0-9a-f] */    
static inline unsigned char ToHex4bits2(unsigned char znak)
//unsigned char ToHex4bits2(unsigned char znak)
    {
    //unsigned char add = '0';
    int add = '0';  /// [0-9]; add value of '0' (65 0x41    '0')
    znak &=0x0F;
    if(znak > 9)  /// [a-f]; if `znak' <0x0a, 0x0f> /// just one comparison as `znak' cannot be bigger than 15 anyway (znak &=0x0F;)
        {
        add = 0x61;  /// 'a'; // 87 0x61    'a'
        }
    return znak + add;
    }

//-----------//
int main()
    {
    int i;
    //char z;
    int z; 

    printf("\nToHex4bits1(i)\n");
    for(i=0; i<16; i++)
        {
        z = ToHex4bits1(i);
        printf("%d\t%02x\t%c\n", z, z, z);
        }

    printf("\nToHex4bits2(i)\n");
    for(i=0; i<16; i++)
        {
        z = ToHex4bits2(i);
        printf("%d\t%02x\t%c\n", z, z, z);
        }
    return 0;
    }

when I run it $ gcc -o tohex4bits tohex4bits.c; ./tohex4bits I get this result:

ToHex4bits1(i)
48  30  0
49  31  1
(...)
57  39  9
97  61  a
98  62  b
(...)
102 66  f
48  30  0
# which is what I expected

ToHex4bits2(i)
48  30  0
49  31  1
(...)
57  39  9
107 6b  k # that's where things get interesting; it's 10 too much ('k'-'a'==10)
108 6c  l
109 6d  m
110 6e  n
111 6f  o
112 70  p

# which is wrong

What is actually wrong with second function ToHex4bits2(), why is that adding 'a' (97/0x61) makes it add 'k' (107/0x6b), or 'A' => 'K' for that matter?

Upvotes: 1

Views: 135

Answers (2)

klutt
klutt

Reputation: 31389

The reason is simple. If znak is 10, then you want to return 'a', but you're returning 'a'+10. So return znak+add-10 instead.

But you're making it extremely difficult for yourself. Magic constants all over the place and extremely complicated code for a simple task. This would do:

{
znak &= 0x0F;
if(znak > 9) 
    return znak + 'a' - 10;
else
    return znak + '0';
}

Or this if you want to be more compact. You're obviously not afraid of complicated code:

{
znak &= 0x0F;
return znak > 9 ? znak + 'a' - 10 : znak + '0';
}

You mentioned that you're trying to optimize this code. I have a hard time seeing how you could do so much about it. It's likely that you would be better of optimizing a bigger chunk to see if there's something wrong with algorithms or something. But we can do a minor thing about the first, and it's this:

#define likely(x)      __builtin_expect(!!(x), 1) 
#define unlikely(x)    __builtin_expect(!!(x), 0) 
static inline unsigned char ToHex4bits1(unsigned char znak)
{
    {
    znak &= 0x0F;
    // Hint the compiler that the first branch is less likely, which
    // improves branch prediction
    if(unlikely(znak > 9)) 
        return znak + 'a' - 10;
    else
        return znak + '0';
    }
}

Read about it here: https://www.geeksforgeeks.org/branch-prediction-macros-in-gcc/

But I think the fastest method is this:

static inline unsigned char ToHex4bits1(unsigned char znak)
{
    const unsigned char ret[] = { '0', '1', '2', '3', '4', '5', '6', '7',
                                  '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
    return ret[znak & 0x0F];
}

Upvotes: 1

Some programmer dude
Some programmer dude

Reputation: 409196

Lets take a closer look at the ToHex4bits2 function:

static inline unsigned char ToHex4bits2(unsigned char znak)
//unsigned char ToHex4bits2(unsigned char znak)
    {
    //unsigned char add = '0';
    int add = '0';  /// [0-9]; add value of '0' (65 0x41    '0')
    znak &=0x0F;
    if(znak > 9)  /// [a-f]; if `znak' <0x0a, 0x0f> /// just one comparison as `znak' cannot be bigger than 15 anyway (znak &=0x0F;)
        {
        add = 0x61;  /// 'a'; // 87 0x61    'a'
        }
    return znak + add;
    }

If the value of znak is larger than 9 then you add the value 0x61 (ASCII code for 'a'). If znak is (for example) 11 (hex 0xb) the addition results in 0x72 which is the ASCII code for 'r' To fix this you should subtract 10 (0xa) from znak first.

And of course you should not use magic numbers. If you mean the character 'a' then say so. In the code itself.

Upvotes: 3

Related Questions