Reputation: 29
I've been trying to read a .ebc file but been unable to. I would like to save it as a .csv or txt file. (I'm new to this file format, so unsure on how to move forward)
This is a description of the file, and the data is publicly available here (under Underground Injection Control Data)
I've tried these threads but nothing has worked: Reading a mainframe EBCDIC File, How can I open .ebc (ebcdic) file on my laptop via python? Convert EBCDIC file to ASCII using Python 2
1, does not print anything:
import codecs
with open("uif700a.txt", "rb") as ebcdic:
ascii_txt = codecs.decode(ebcdic.read(), "cp500")
print(ascii_txt)
2, does not print anything
with open("uif700.ebc", encoding='cp500') as f:
print(f.read())
3, file is also available as ASCII, so I tried:
data = pd.read_csv('uif700a.txt', on_bad_lines='skip', encoding = "cp037",header=None)
EmptyDataError: No columns to parse from file
Upvotes: 1
Views: 1121
Reputation: 35
Perhaps this reply is too late now (5 months later), but I'll give it a try. I had the same problem a couple of years ago with RRC EBC files. I had to reach out to an IBM friend to be able to solve the problem. This is what I did. Hope it help.
Note: this is not my entire code, so probably it is not going to run like it is.
YAML:
"GLOBAL":
record_length: 22
'01':
RRC-TAPE-RECORD-ID
start: 1
length: 2
type: string
UIC-CNTL-NO:
start: 3
length: 9
type: number
...
You need to do that to all field you are using (tedious). Note that if the value in the field doesn't take the entire length, the end will be filled with empty spaces
Python 3:
chunks = []
with open(path, 'rb') as f:
data = f.read()
print(f'Data size: {len(data):,}')
n = config['GLOBAL']['record_length']
for i in range(0,len(data),n):
chunks.append(data[i:i+n])
Now you have a list of values, but not decoded. They are all in binary.
Python 3:
def packed( bytes ): #take the bytes
n= [ '' ]
for b in bytes[:-1]: # decode bytes to number
hi, lo = divmod( b, 16 )
n.append( str(hi) )
n.append( str(lo) )
digit, sign = divmod( bytes[-1], 16 )
n.append( str(digit) )
if sign in (0x0b, 0x0d ): # Take care of Sign
n[0]= '-'
else:
n[0]= '+'
return float(''.join(n))
Python 3:
import codecs
#Do this for EVERY string field in the Config YAML file
decode_text = codecs.getdecoder('cp037')
record_type = decode_text(chunk[0:2])[0]
Python 3:
for chunk in chunks:
#all fields inside RRC-TAPE_RECORD-ID. The first col for that table will be alway '01'
config_fields = config['01']
# Get the name of all fields. Those are your columns
cols = list(temp.keys())[1:]
for i in cols:
variable = temp[i]
start = variable['start']
length = variable['length']
if variable['type'] == 'display':
result = decode_text(chunk[start-1:start-1+length].strip())[0]
elif variable['type'] == 'packed':
result = packed(chunk[start-1:start-1+length])
line.append(result)
lines.append(line)
df = pd.DataFrame(lines, columns=cols)
return df
Upvotes: 3