d11wtq
d11wtq

Reputation: 35308

Convert hex string to string of bytes

I need to convert a (potentially very long) string like char * s = "2f0a3f" into the actual bytes it represents, when decoded from the hex representation. Currently I'm doing this, but it feels clunky and wrong.

  size_t hexlength = strlen(s);
  size_t binlength = hexlength / 2;

  unsigned char * buffer = malloc(binlength);
  long i = 0;
  char a, b;

  for (; i < hexlength; i += 2) {
    a = s[i + 0]; b = s[i + 1];
    buffer[i / 2] =
      ((a < '9' ? a - '0' : a - 'a' + 10) << 4) + (b < '9' ? b - '0' : b - 'a' + 10);
  }

Two things strike me as ugly about this:

  1. The way I'm dividing by two each time I push into the buffer
  2. The conditional logic to figure out the decimal value of the hex digits

Is there a better way? Preferably not using something I'd have to add a dependency on (since I want to ship this code with minimal cross-platform issues). My bitwise math is awful ;)

NOTE: The data has been pre-validated to all be lowercase and to be a correct string of hex pairs.

Upvotes: 6

Views: 7227

Answers (7)

ShunShirou
ShunShirou

Reputation: 9

I came up with a simpler function that gets the string and copies byte by byte the conversion result to a byte array for a given N size with boundary and integrity check:

int8_t convert_str_to_bytes(uint8_t *byte_array, char* str, size_t n)
{
    char *hex_match = "0123456789ABCDEF";
    int i, j = 0;
    char cbuf[3];
    long ibuf;

    if (strlen(str) < n) {
            printf("ERROR: String is shorter than specified size.\n");
            return -1;
    }

    for (i = 0; i < n; i += 2) {

            strncpy(cbuf, &str[i], 2);

            if (strspn(cbuf, hex_match) != 2) {
                    printf("ERROR: String is not a hexadecimal representation. Breaking now...\n");
                    return -1;
            }

            ibuf = strtol(cbuf, NULL, 16);

            byte_array[j] = (uint8_t)ibuf;
            ++j;
    }

    return 0;
}

Upvotes: 0

madex
madex

Reputation: 1

Here some small improvements to be MISRA complience. The name was confusing.

static inline uint8_t HexcharToInt(char c) {
    char result = 16;
    if (('0' <= c) && (c <= '9')) {
        result = c - '0';
    } else if (('a' <= c) && (c <= 'f')) {
        result = c + 10 - 'a';
    } else if (('A' <= c) && (c <= 'F')) {
        result = c + 10 - 'A';
    }
    return (uint8_t) result;
}

uint8_t *array = NULL;

size_t hexstringToArray(char *hexstring) {
    size_t len    = (strlen(hexstring) + 1) / 2; // Aufrunden
    if (array != NULL) {
        free(array);
        array = NULL;
    }
    array = (uint8_t*) malloc(len);
    uint8_t *arr = array;
    for (size_t i = 0; (i < len) && (len > 0); i++) {
        *arr = 0U;
        for (uint8_t shift = 8U; (shift > 0U) && (len > 0); ) {
            shift -= 4U;
            uint8_t curInt = HexcharToInt(*hexstring++);
            if (curInt >= 16U) {
                len = 0;
            } else {
                *arr |= ((uint8_t) curInt << shift);
            }
        }
        arr++;
    }
    return len;
}

Upvotes: 0

Fan
Fan

Reputation: 1

inline char HexToChar(char c)
{
    if ('0' <= c && c <= '9')
    {
        return c - '0';
    }
    else if ('a' <= c && c <= 'f')
    {
        return c + 10 - 'a';
    }
    else if ('A' <= c && c <= 'F')
    {
        return c + 10 - 'A';
    }

    return -1;
}

size_t HexToBinrary( const char* hex, size_t length, char* binrary, size_t binrary_cap )
{
    if (length % 2 != 0 || binrary_cap < length / 2)
    {
        return 0;
    }

    memset(binrary, 0, binrary_cap);
    size_t n = 0;
    for (size_t i = 0; i < length; i += 2, ++n)
    {
        char high = HexToChar(hex[i]);
        if (high < 0)
        {
            return 0;
        }

        char low = HexToChar(hex[i + 1]);
        if (low < 0)
        {
            return 0;
        }

        binrary[n] = high << 4 | low;
    }
    return n;
}

Upvotes: -1

egrunin
egrunin

Reputation: 25053

/* allocate the buffer */
char * buffer = malloc((strlen(s) / 2) + 1);

char *h = s; /* this will walk through the hex string */
char *b = buffer; /* point inside the buffer */

/* offset into this string is the numeric value */
char xlate[] = "0123456789abcdef";

for ( ; *h; h += 2, ++b) /* go by twos through the hex string */
   *b = ((strchr(xlate, *h) - xlate) * 16) /* multiply leading digit by 16 */
       + ((strchr(xlate, *(h+1)) - xlate));

Edited to add

In 80x86 assembly lanugage, the heart of strchr() is basically one instruction - it doesn't loop.

Also: this does no bounds checking, won't work with Unicode console input, and will crash if passed an invalid character.

Also: thanks to those who pointed out some serious typos.

Upvotes: 5

ecatmur
ecatmur

Reputation: 157354

Not that it'd make much difference, but I'd go with a multiplication over a division. Also it's worth splitting out the digit code, as you might want to port it to a platform where a-f are not adjacent in the character set (only joking!)

  inline int digittoint(char d) {
    return ((d) <= '9' ? (d) - '0' : (d) - 'a' + 10);
  }
  #define digittoint(d) ((d) <= '9' ? (d) - '0' : (d) - 'a' + 10)

  size_t hexlength = strlen(s);
  size_t binlength = hexlength / 2;

  unsigned char * buffer = malloc(binlength);
  long i = 0;
  char a, b;

  for (; i < binlength; ++i) {
    a = s[2 * i + 0]; b = s[2 * i + 1];
    buffer[i] = (digittoint(a) << 4) | digittoint(b);
  }

I've fixed a bug in your digit-to-int implementation, and replaced the + with bitwise or on the grounds that it better expresses your intent.

You can then experiment to find the best implementation of digittoint - conditional arithmetic as above, strspn, or a lookup table.

Here's a possible branchless implementation that - bonus! - works on uppercase letters:

inline int digittoint(char d) {
    return (d & 0x1f) + ((d >> 6) * 0x19) - 0x10;
}

Upvotes: 4

Remy Lebeau
Remy Lebeau

Reputation: 596307

Try something like this:

const unsigned char bin[128] =
{
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1
};

int hexlength = strlen(s); 
int binlength = (hexlength / 2); 

unsigned char * buffer = (unsigned char *) malloc(binlength); 
if (buffer)
{
    char *hex = s; 

    unsigned char *buf = buffer;
    unsigned char b, c;

    int ok = 1;

    for (int i = 0; i < hexlength; i += 2)
    { 
        b = bin[*hex++];
        c = bin[*hex++];

        if ((b == -1) || (c == -1))
        {
            ok = 0;
            break;
        )

        *buf++ = ((b << 4) | c); 
    }

    if (ok == 1)
    {
        // use buffer as needed, up to binlength number of bytes...
    }

    free(buffer);
} 

Upvotes: 1

Ben Richards
Ben Richards

Reputation: 528

If you need your number (in a string) converted from hex to decimal, you may use atol() with sprintf()

If you need to do it byte-by-byte, you can buffer each byte, and as each buffer is filled, pass it through sprintf as such:

char *hexRep;
char *decRep;
long int decVal;
...
decVal = atol(hexRep);
sprintf(decRep, "%u", decVal);

Both of these are in C's standard library. After you get the string representation of each byte, you could just concatenate them together with strcat().

Upvotes: 0

Related Questions