Reputation: 11
I am trying to write this EBCDIC string(ÑkÀ*) to a file. I want to save the exact string in the file but whenever i try saving it, it adds extra characters to the file and the byte count is 6 bytes(2 hidden characters added) instead of 4 bytes(ÑkÀ*).
Note: i have 6992645 which was converted to hex(6992645C) and later to EBCDIC(ÑkÀ*) so I can send it to mainframe. I need the exact EBCDIC string(ÑkÀ*) in the file so IBM mainframe can read it and convert it to hex as 6992645C.
def encode_ebcdic(input_str):
# taking of COMP-3 Identifier and removing the 0 that makes it even
if 'F' in input_str:
input_str = input_str.replace('F', '')
if '0C' in input_str:
input_str = input_str.replace('0C', 'C')
print("Hex:", input_str)
# Convert the hex string to bytes
input_bytes = bytes.fromhex(input_str)
# Decode the bytes using EBCDIC encoding
ebcdic_str = input_bytes.decode('cp037')
# Return the decoded string
return ebcdic_str
def write_to_file(output_str):
with open('output.txt', 'w') as f:
# Write output to the file
f.write(output_str)
input_string = "F69926450C"
Upvotes: 1
Views: 396
Reputation: 54733
What you are doing is converting the hex value to EBCDIC bytes, then converting the EBCDIC back to Unicode, then writing the Unicode to file, wherein it gets converted to UTF-8.
You should not worry about EBCDIC here at all. All you need is to write hex bytes to a file. It's up to the reader to interpret them. So, just convert the hex to bytes, and write the bytes to a BINARY file (mode 'wb'):
def encode_ebcdic(input_str):
# taking of COMP-3 Identifier and removing the 0 that makes it even
if 'F' in input_str:
input_str = input_str.replace('F', '')
if '0C' in input_str:
input_str = input_str.replace('0C', 'C')
print("Hex:", input_str)
return bytes.fromhex(input_str)
def write_to_file(output_str):
with open('output.txt', 'wb') as f:
# Write output to the file
f.write(output_str)
input_string = "F69926450C"
write_to_file(encode_ebcdic(input_string))
Output:
timr@Tims-NUC:~/src$ hexdump -C output.txt
00000000 69 92 64 5c |i.d\|
00000004
Naturally, you will not see "ÑkÀ*" in the output file, because your terminal (and mine) doesn't speak EBCDIC. 0x69 is only Ñ if the terminal is EBCDIC. With our ASCII terminals, 0x69 is "i".
Upvotes: 2