Pwrcdr87
Pwrcdr87

Reputation: 965

ASCII Representation of Hexadecimal

I have a string that, by using string.format("%02X", char), I've received the following:

74657874000000EDD37001000300

In the end, I'd like that string to look like the following:

t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).

I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.

I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.

ETA:

Well I thought that I had fixed it with the following code:

function hexToAscii(input)
    local convString = ""
    for char in input:gmatch("(..)") do
        convString = convString..(string.char("0x"..char))
    end
    return convString
end

It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.

I did the following to check since I couldn't truly "see" them in the file.

function asciiSub(input)
    input = input:gsub(string.char(0x00), "<NUL>")  -- suggested by a coworker
    print(input)
end

I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.

Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.

Upvotes: 0

Views: 3079

Answers (2)

nobody
nobody

Reputation: 4264

The simple way to transform a base16-encoded string is just to

function unhex( input )
    return (input:gsub( "..", function(c)
        return string.char( tonumber( c, 16 ) )
    end))
end

This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)

The resulting string should be exactly what went into the hex-dumper, no matter the encoding.

If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.

Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.

Upvotes: 3

roeland
roeland

Reputation: 5751

You actually have a couple of problems:

  • First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

  • Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.

  • The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.

  • Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.

Upvotes: 0

Related Questions