Reputation: 465
I have ebcdic file in hdfs I want to load data to spark dataframe, process it and load results as orc files, I found that there is a open source solution which is cobrix cobrix, that allow to get data from ebcdic files, but developer must provide a copybook file which is a schema definition.
A few line of my ebcedic file are presented in the attached image. I want to get the format of copybook of the ebcdic file, essentially I want to read the vin his length is 17, vin_data the length is 3 and finally vin_val the length is 100.
Upvotes: 0
Views: 1051
Reputation: 51553
Based on your comment in the question, and looking at the input file, you could start with this.
01 VIN-RECORD.
05 VIN PIC X(17).
05 VIN-COUNT PIC S9(5) COMP-3.
05 VIN-VALUE PIC X(100).
I'm guessing that the second field is COMP-3 based on the six examples all ending with a C byte. This indicates a positive COMP-3 value. A D byte would be a negative COMP-3 value. An F byte would indicate an unsigned COMP-3 value.
The third field is variable length and right padded with spaces.
Upvotes: 1
Reputation: 7297
how to define a copybook file of ebcdic data?
You don't.
A copybook may be used as a record definition (=how the data is stored), it has nothing to do with the encoding of data that may be stored in that.
This leaves the question "How do I define the record structure?"
You'd need the amount of fields, their length and type (it likely is not only USAGE DISPLAY
) and then just define it with some fancy names. Ideally you just get the original record definition from the COBOL program writing the file, put that into a copybook if it isn't in one yet, and use that.
Your link has samples that show actually how a copybook looks like, if you struggle on the definition then please edit your question with the copybook you've defined and we may be able to help.
Upvotes: 2