roymustang86
roymustang86

Reputation: 8573

Data encoding and decoding using python

This is less of a programming question, and more of a question to understand what is what? I am not a CS major, and I am trying to understand the basic difference between these 3 formats :

1) EBCDIC 2) Unsigned binary number 3) Binary coded decimal

If this is not a real question, I apologize, but google was not very useful in explaining this to me

Say I have a string of numbers like "12890". What would their representation in

EBCDIC, Unsigned binary number and BCD format?

Is there a python 2.6 library I can use to simply convert any string of numbers to either of these formats?

For example, for string to ebcdic, I am doing

def encodeEbcdic(text):
    return text.decode('latin1').encode('cp037')

print encodeEbcdic('AGNS')

But, I get this ┴╟╒Γ

Upvotes: 1

Views: 4145

Answers (2)

roymustang86
roymustang86

Reputation: 8573

saulspatz, thanks for your explanation. I was able to find out what are the necessary methods needed to convert any string of numbers into their different encoding. I had to refer to Effective Python Chapter 1, Item 3 : Know the Differences Between bytes, str, and unicode

And from there on, I read more about data types and such.

Anyway, to answer my questions :

1) String to EBCDIC:

def encode_ebcdic(text):
    return text.decode('latin1').encode('cp037')

The encoding here is cp037 for USA. You can use cp500 for International. Here is a list of them : https://en.wikipedia.org/wiki/List_of_EBCDIC_code_pages_with_Latin-1_character_set

2) Hexadecimal String to unsigned binary number :

def str_to_binary(text):
    return int(str, 16)

This is pretty basic, just convert the Hexadecimal string to a number.

3) Hexadecimal string to Binary coded decimal:

def str_to_bcd(text):
    return bytes(str).decode('hex')

Yes, you need to convert it to a byte array, so that BCD conversion can take place. Please read saulspatz answer for what BCD encoding is.

Upvotes: 1

saulspatz
saulspatz

Reputation: 5261

EBCDIC is an IBM character encoding. It's meant for encoding text. Of course numerals can occur in text, as in "1600 Pennsylvania Avenue" so there are codes for numerals, too. To translate 1600 to EBCDIC, you need to find an EBCDIC table. Then you look up the code for 1, the code for 6, and the code for 0 (twice.) According to the table at http://www.astrodigital.org/digital/ebcdic.html the EBCIDIC code for 0 through 9 are F0 through F9, respectively. This looks familiar, but I can't say I really remember.

An unsigned binary number is just that. It's the number written in base two. (See below.)

Binary-coded decimal (BCD) is an old format for storing the decimal representation of numbers on a digital computer. Each decimal digit is represented by its binary equivalent. Let's take 64 as an example. Since 64 is just 2 to the sixth power, in binary it's represented as a 1 followed by 6 0's: 1000000. In binary-coded decimal, we write the six in binary -- 0110 and the four in binary -- 0100 so that the BCD representation is 01100100. We need four bits for each digit, because the largest decimal digit, 9 works out to be 1001. BCD was used extensively in COBOL. If it's used anywhere else these days, I'm not familiar with the application.

Edit: I should have explained that F0, F1, etc. in EBCDIC are hex codes, so the F is 1111 and the digits are the same as in BCD. So, EBCDIC for numbers turns out to be the same as BCD, but with an extra 1111 before each digit.

Upvotes: 2

Related Questions