Reputation: 175
There are some solutions available to read the EDCDIC files like - https://github.com/rbheemana/Cobol-to-Hive, but this fails when the EDCDIC file contains rows with unequal offset length.
I wrote MapReduce job to read EBCDIC files and convert to CSV/Parquet by reading the each row based on offset values, so it is fixed length for all rows and following is sample code
Configuration conf = new Configuration();
conf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 100);
Job job = Job.getInstance(conf);
job.setInputFormatClass(FixedLengthInputFormat.class);
This also fails when the input EBCDIC file is not divisible by offset (record) length.
Is there any way to read & convert EBCDIC file with Header and Footer to ASCII file?
Upvotes: 0
Views: 1271
Reputation: 1333
Cobrix may be what you're looking for. It is an open-source COBOL data source for Spark.
It supports fixed and variable-length records, which may be related to the issue you're facing.
DISCLAIMER: I work for ABSA and I'm one of the developers behind this library.
Upvotes: 0
Reputation: 10543
I do not know much about hadoop and I am presuming the file comes from the IBM Mainframe (Z-OS). Also looking at https://github.com/rbheemana/Cobol-to-Hive, it looks like it can handle VB files so there should be a way.
If the file is a VB file on the Mainframe, each record will have a Record Descriptor Word (RDW). Some file transfer functions drop the RDW by default. You will probably want the RDW. Certaily JRecord can use it
Possible solutions may include:
Upvotes: 0