Reputation: 881973
What is the simplest way in C to convert an EBCDIC-encoded string to its ASCII equivalent in-place.
The only characters that need to be converted are the space, alphanumerics, and from the set <=>()+-*/&|!$#@.,;%_?"
. All other characters can simply be replaced with .
.
The function signature will basically be:
void ebcdicToAscii (char *s);
At the moment, I'm leaning towards a series of lookup tables and multiple if
statements for the various EBCDIC sections, but I wonder if there's a better way.
Upvotes: 2
Views: 11066
Reputation: 133
The marked answer here didn't work for me. It didn't work for special characters like ^ and | . So I've written a table generator that you can get a table based on the given code page. Its C# .net core 8. I hope this will help someone.
using System.Text;
int codePage = 37; //ibm-37 - change the code page here
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
List<Tuple<byte, byte>> arr = new List<Tuple<byte, byte>>();
for (int i = 0; i < 256; i++)
{
var ebdic = Encoding.GetEncoding(codePage).GetBytes(new char[] { (char)i });
arr.Add(new Tuple<byte, byte>((byte)i, ebdic[0]));
}
var ordedList = arr.OrderBy(x => x.Item2).ToList();
Console.WriteLine("static const unsigned char e2a[256] = {");
foreach (var item in ordedList)
Console.Write(item.Item1 + ",");
Console.WriteLine("};");
https://gist.github.com/anuradhaindika83/3a257005fb9c48fd2fe72be5ad6050e1
Upvotes: 0
Reputation: 43472
You probably want a translation table. That'd be a one-dimensional array of 256 elements; each one is positioned at its EBCDIC location, and its value is the ASCII value of the same character.
const char ebcdicToAsciiTable[256];
Then, to convert in-place:
void ebcdicToAscii(char *s) {
size_t len = strlen(s);
for (size_t i = 0; i < len; i++)
s[i] = ebcdicToAsciiTable[(unsigned char)(s[i])];
}
The table's content is left as an exercise for the reader. ;)
Upvotes: 3
Reputation: 15769
Using the table from here, from the top of my head:
static const unsigned char e2a[256] = {
0, 1, 2, 3,156, 9,134,127,151,141,142, 11, 12, 13, 14, 15,
16, 17, 18, 19,157,133, 8,135, 24, 25,146,143, 28, 29, 30, 31,
128,129,130,131,132, 10, 23, 27,136,137,138,139,140, 5, 6, 7,
144,145, 22,147,148,149,150, 4,152,153,154,155, 20, 21,158, 26,
32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33,
38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94,
45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63,
186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34,
195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201,
202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237,
125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243,
92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255
};
void ebcdicToAscii (unsigned char *s)
{
while (*s)
{
*s = e2a[(int) (*s)];
s++;
}
}
For your specific requirements, I would suggest something like:
#include <stdio.h>
void inSituEbcdicToAscii (char *s) {
static char etoa[] =
" "
" "
" .<(+|& !$*); " // first char here is real space
"-/ ,%_>? `:#@'=\""
" abcdefghi jklmnopqr "
" stuvwxyz "
" ABCDEFGHI JKLMNOPQR "
" STUVWXYZ 0123456789 ";
while (*s != '\0') {
*s = etoa[(unsigned char)*s];
s++;
}
}
int main (void) {
char str[] = "\xc8\x85\x93\x93\x96\x40\xa3\x88\x85\x99\x85\x5a";
inSituEbcdicToAscii (str);
printf ("%s\n", str);
return 0;
}
which outputs Hello there!
from the equivalent EBCDIC characters. All other characters beyond those you showed an interest in are converted to a space, though you can change that to something else (make sure you don't modify EBCDIC code 0x40
which is the real space).
Upvotes: 11
Reputation: 400454
The simplest would be to use a 256-entry lookup table. Here's one way to generate such a table using Python:
print 'static const char kEbcdicToAscii[256] = {';
for i in range(256):
print ' %d,' % ord(chr(i).decode('cp500'))
print '};'
Then to decode:
void ebcdicToAscii(char *s)
{
while(*s)
*s++ = kEbcdicToAscii[(unsigned char)*s];
}
This will also likely be the fastest method, since the 256-byte table will easily fit in your L1 cache. If you really want to convert other characters to '.'
instead of converting them properly, then modify the table like so:
import string
print 'static const char kEbcdicToAscii[256] = {';
for i in range(256):
asc = chr(i).decode('cp500')
if asc not in string.ascii_letters + string.digits + ' <=>()+-*/&|!$#@.,;%_?"':
asc = '.'
print ' %d,' % ord(asc)
print '};'
Upvotes: 1