Veronica Guzman
Veronica Guzman

Reputation: 29

Read EBCDIC file in Python

I've been trying to read a .ebc file but been unable to. I would like to save it as a .csv or txt file. (I'm new to this file format, so unsure on how to move forward)

This is a description of the file, and the data is publicly available here (under Underground Injection Control Data)

I've tried these threads but nothing has worked: Reading a mainframe EBCDIC File, How can I open .ebc (ebcdic) file on my laptop via python? Convert EBCDIC file to ASCII using Python 2

1, does not print anything:

import codecs

with open("uif700a.txt", "rb") as ebcdic:
    ascii_txt = codecs.decode(ebcdic.read(), "cp500")
    print(ascii_txt)

2, does not print anything

with open("uif700.ebc", encoding='cp500') as f:
    print(f.read())

3, file is also available as ASCII, so I tried:

data = pd.read_csv('uif700a.txt', on_bad_lines='skip', encoding = "cp037",header=None)

EmptyDataError: No columns to parse from file

Upvotes: 1

Views: 1121

Answers (1)

Perhaps this reply is too late now (5 months later), but I'll give it a try. I had the same problem a couple of years ago with RRC EBC files. I had to reach out to an IBM friend to be able to solve the problem. This is what I did. Hope it help.

Note: this is not my entire code, so probably it is not going to run like it is.

  1. It is not a character delimited file (like csv). It's a fixed-length dataset. In my solution, I created a configuration file in YAML to decode the position:
  • Total row size
  • The Index ID (RRC-TAPE-RECORD-ID)
  • The file name (UIC-CNTL-NO)
  • Character start (the field above starts at 1)
  • Length (the field above has length of 9 according documentation you shared)
  • Type (because number need a special conversion. String are simpler)

YAML:

"GLOBAL":
  record_length: 22
'01':
  RRC-TAPE-RECORD-ID
    start: 1
    length: 2
    type: string
  UIC-CNTL-NO:
    start: 3
    length: 9
    type: number
  ...

You need to do that to all field you are using (tedious). Note that if the value in the field doesn't take the entire length, the end will be filled with empty spaces

  1. Split the entire file in chunks. Each chunk end at the length on the specification (you case is 622 and it is in the config file)

Python 3:

chunks = []
with open(path, 'rb') as f:
    data = f.read()
    print(f'Data size: {len(data):,}')
    n = config['GLOBAL']['record_length']
    for i in range(0,len(data),n):
        chunks.append(data[i:i+n])

Now you have a list of values, but not decoded. They are all in binary.

  1. Create a function to decode the number. This was the challenging part to figure it out, but in the end was not complicated

Python 3:

def packed( bytes ): #take the bytes
    n= [ '' ]
    for b in bytes[:-1]: # decode bytes to number
            hi, lo = divmod( b, 16 ) 
            n.append( str(hi) )
            n.append( str(lo) )
    digit, sign = divmod( bytes[-1], 16 )
    n.append( str(digit) )
    if sign in (0x0b, 0x0d ): # Take care of Sign
            n[0]= '-'
    else:
            n[0]= '+'
    return float(''.join(n))
  1. Use a library to decode the string

Python 3:

import codecs
#Do this for EVERY string field in the Config YAML file
decode_text = codecs.getdecoder('cp037')
record_type = decode_text(chunk[0:2])[0]
  1. Put it all together, looping on each Row and each Col and then you can finally save to Pandas

Python 3:

for chunk in chunks:
   #all fields inside RRC-TAPE_RECORD-ID. The first col for that table will be alway '01'
   config_fields = config['01']
   # Get the name of all fields. Those are your columns
   cols = list(temp.keys())[1:]
    for i in cols:
        variable = temp[i]
        start = variable['start']
        length = variable['length']
        if variable['type'] == 'display':
            result = decode_text(chunk[start-1:start-1+length].strip())[0]
        elif variable['type'] == 'packed':
            result = packed(chunk[start-1:start-1+length])
        line.append(result)
lines.append(line)
df = pd.DataFrame(lines, columns=cols)
return df

Upvotes: 3

Related Questions