PackedUp
PackedUp

Reputation: 391

EBCDIC to ASCII in RUBY

I have a EBCDIC file that was generated from a Mainframe and will need to convert it to ASCII for data processing.
Any help would be appreciated.

Upvotes: 0

Views: 975

Answers (2)

Mike Slinn
Mike Slinn

Reputation: 8407

There are many flavors of EBCDIC. The following displays all available EBCDIC encodings for Ruby 3.1.2p20:

Encoding.name_list.select{|x|x.start_with? 'IBM'}.sort.join(", ")

Output is:

IBM037, IBM437, IBM720, IBM737, IBM775, IBM850, IBM852, IBM855, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, IBM869

I am unaware of any way to determine which is in use, except to notice when a problem occurs while converting.

To convert from IBM037 to UTF-8:

File.read('some_ibm_file', encoding: 'IBM037:UTF-8')

However, the Linux iconv command knows of 187 flavors of EBCDIC. Perhaps some of these are aliases, I do not know.

$ iconv -l | grep '^IBM.*' | tr -d '//' | sort | \
  tr '\n' ' ' | fold -sw 70

IBM-1008 IBM-1025 IBM-1046 IBM-1047 IBM-1097 IBM-1112 IBM-1122
IBM-1123 IBM-1124 IBM-1129 IBM-1130 IBM-1132 IBM-1133 IBM-1137
IBM-1140 IBM-1141 IBM-1142 IBM-1143 IBM-1144 IBM-1145 IBM-1146
IBM-1147 IBM-1148 IBM-1149 IBM-1153 IBM-1154 IBM-1155 IBM-1156
IBM-1157 IBM-1158 IBM-1160 IBM-1161 IBM-1162 IBM-1163 IBM-1164
IBM-1166 IBM-1167 IBM-12712 IBM-1364 IBM-1371 IBM-1388 IBM-1390
IBM-1399 IBM-16804 IBM-4517 IBM-4899 IBM-4909 IBM-4971 IBM-5347
IBM-803 IBM-856 IBM-901 IBM-902 IBM-9030 IBM-9066 IBM-921 IBM-922
IBM-930 IBM-932 IBM-933 IBM-935 IBM-937 IBM-939 IBM-943 IBM-9448
IBM037 IBM038 IBM1004 IBM1008 IBM1025 IBM1026 IBM1046 IBM1047 IBM1089
IBM1097 IBM1112 IBM1122 IBM1123 IBM1124 IBM1129 IBM1130 IBM1132
IBM1133 IBM1137 IBM1140 IBM1141 IBM1142 IBM1143 IBM1144 IBM1145
IBM1146 IBM1147 IBM1148 IBM1149 IBM1153 IBM1154 IBM1155 IBM1156
IBM1157 IBM1158 IBM1160 IBM1161 IBM1162 IBM1163 IBM1164 IBM1166
IBM1167 IBM12712 IBM1364 IBM1371 IBM1388 IBM1390 IBM1399 IBM16804
IBM256 IBM273 IBM274 IBM275 IBM277 IBM278 IBM280 IBM281 IBM284 IBM285
IBM290 IBM297 IBM367 IBM420 IBM423 IBM424 IBM437 IBM4517 IBM4899
IBM4909 IBM4971 IBM500 IBM5347 IBM775 IBM803 IBM813 IBM819 IBM848
IBM850 IBM851 IBM852 IBM855 IBM856 IBM857 IBM858 IBM860 IBM861 IBM862
IBM863 IBM864 IBM865 IBM866 IBM866NAV IBM868 IBM869 IBM870 IBM871
IBM874 IBM875 IBM880 IBM891 IBM901 IBM902 IBM903 IBM9030 IBM904
IBM905 IBM9066 IBM912 IBM915 IBM916 IBM918 IBM920 IBM921 IBM922
IBM930 IBM932 IBM933 IBM935 IBM937 IBM939 IBM943 IBM9448

The source code for win_iconv.c, contains the following comments:

{37, "IBM037"}, /* IBM EBCDIC US-Canada */
{437, "IBM437"}, /* OEM United States */
{500, "IBM500"}, /* IBM EBCDIC International */
{708, "ASMO-708"}, /* Arabic (ASMO 708) */
/* 709      Arabic (ASMO-449+, BCON V4) */
/* 710      Arabic - Transparent Arabic */
{720, "DOS-720"}, /* Arabic (Transparent ASMO); Arabic (DOS) */
{737, "ibm737"}, /* OEM Greek (formerly 437G); Greek (DOS) */
{775, "ibm775"}, /* OEM Baltic; Baltic (DOS) */
{850, "ibm850"}, /* OEM Multilingual Latin 1; Western European (DOS) */
{852, "ibm852"}, /* OEM Latin 2; Central European (DOS) */
{855, "IBM855"}, /* OEM Cyrillic (primarily Russian) */
{857, "ibm857"}, /* OEM Turkish; Turkish (DOS) */
{858, "IBM00858"}, /* OEM Multilingual Latin 1 + Euro symbol */
{860, "IBM860"}, /* OEM Portuguese; Portuguese (DOS) */
{861, "ibm861"}, /* OEM Icelandic; Icelandic (DOS) */
{862, "DOS-862"}, /* OEM Hebrew; Hebrew (DOS) */
{863, "IBM863"}, /* OEM French Canadian; French Canadian (DOS) */
{864, "IBM864"}, /* OEM Arabic; Arabic (864) */
{865, "IBM865"}, /* OEM Nordic; Nordic (DOS) */
{870, "IBM870"}, /* IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2 */
{1026, "IBM1026"}, /* IBM EBCDIC Turkish (Latin 5) */
{1047, "IBM01047"}, /* IBM EBCDIC Latin 1/Open System */
{1140, "IBM01140"}, /* IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro) */
{1141, "IBM01141"}, /* IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro) */
{1142, "IBM01142"}, /* IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro) */
{1143, "IBM01143"}, /* IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro) */
{1144, "IBM01144"}, /* IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro) */
{1145, "IBM01145"}, /* IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro) */
{1146, "IBM01146"}, /* IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro) */
{1147, "IBM01147"}, /* IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro) */
{1148, "IBM01148"}, /* IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro) */
{1149, "IBM01149"}, /* IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro) */

Upvotes: 0

knut
knut

Reputation: 27875

Since [Ruby 2.3 the EBCDIC-encoding is available][1]:

Encoding

new Encoding::IBM037 (alias ebcdic-cp-us; dummy)

So this should work:

src = 'out_26877296.tst'
content = File.read(src, encoding: 'IBM037:ASCII')

Upvotes: 0

Related Questions