user491
user491

Reputation: 175

Read header and footer from mainframe EBCDIC file

There are some solutions available to read the EDCDIC files like - https://github.com/rbheemana/Cobol-to-Hive, but this fails when the EDCDIC file contains rows with unequal offset length.

I wrote MapReduce job to read EBCDIC files and convert to CSV/Parquet by reading the each row based on offset values, so it is fixed length for all rows and following is sample code

Configuration conf = new Configuration();
conf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 100);
Job job = Job.getInstance(conf);
job.setInputFormatClass(FixedLengthInputFormat.class);

This also fails when the input EBCDIC file is not divisible by offset (record) length.

Is there any way to read & convert EBCDIC file with Header and Footer to ASCII file?

Upvotes: 0

Views: 1271

Answers (2)

Felipe Martins Melo
Felipe Martins Melo

Reputation: 1333

Cobrix may be what you're looking for. It is an open-source COBOL data source for Spark.

It supports fixed and variable-length records, which may be related to the issue you're facing.

DISCLAIMER: I work for ABSA and I'm one of the developers behind this library.

Upvotes: 0

Bruce Martin
Bruce Martin

Reputation: 10543

I do not know much about hadoop and I am presuming the file comes from the IBM Mainframe (Z-OS). Also looking at https://github.com/rbheemana/Cobol-to-Hive, it looks like it can handle VB files so there should be a way.

Warning on File transfer

If the file is a VB file on the Mainframe, each record will have a Record Descriptor Word (RDW). Some file transfer functions drop the RDW by default. You will probably want the RDW. Certaily JRecord can use it

Possible Solutions

Possible solutions may include:

  • Convert the file to Fixed Width on the mainframe/As400 before doing the transfer - very easy to do.
  • Extract the header / footer details on the mainframe - very easy
  • Use JRecord to either extract Header / Footer or convert to fixed width - very easy
  • look at project CopybookInputFormat it is based on JRecord, it may work better . It should have better Cobol support.
  • Use JRecord to read the file (will need to do your own code to load in to hadoop).

Upvotes: 0

Related Questions