Reputation: 29
First let me apologize if data is not that complete . This is not me being lazy but me being not aware of cobol details .
I have been assigned in my firm to extract our old financial data from files read by cobol programs and turn them to a database in our oracle DB . I am not able to read these files as normal texts . i don't know how to turn then to normal text .
As per the cobol source each row is 7 records and each record is 72 chars .
the files are very large . each one is 3 GB in average . how can i open them as a normal text ?
here is the file section
000220 ENVIRONMENT DIVISION.
000230 CONFIGURATION SECTION.
000240 SOURCE-COMPUTER. NCR-3000.
000250 OBJECT-COMPUTER. NCR-3000.
000260 INPUT-OUTPUT SECTION.
000270 FILE-CONTROL.
000280 SELECT DQ-HIMVT-A ASSIGN TO DISC
000290 ORGANIZATION INDEXED
000300 ACCESS MODE DYNAMIC
000310 RECORD KEY CLE-A.
000320*
000330 DATA DIVISION.
000340 FILE SECTION.
000350 FD DQ-HIMVT-A BLOCK CONTAINS 7 RECORDS
000360 RECORD CONTAINS 73 CHARACTERS
000370 LABEL RECORD STANDARD
000380 DATA RECORD IS HIMVT-A.
000390 01 HIMVT-A.
000400 02 CLE-A.
000410 03 ENT-A PIC 99.
000420 03 NUCPT-A PIC 9(13) COMP-6.
000430 03 DEV-A PIC XXX.
000440 03 DATOP-A PIC 9(7) COMP-6.
000450 03 SIG-A PIC 9.
000460 03 FORC-A PIC 9.
000470 03 DATVAL-A PIC 9(7) COMP-6.
000480 03 NUMOP-A PIC 9(9) COMP-6.
000490 03 MT-A PIC 9(12)V999 COMP-6.
000500 02 FILLER PIC X(8).
000510 02 TYPCPT-A PIC 9(3) COMP-6.
000520 02 LIBOP-A PIC X(15).
000530 02 SOLD-A PIC S9(12)V999 COMP-3.
000540 02 DATTRAIT-A PIC 9(7) COMP-6.
000550 02 FILLER PIC X.
Here is a sample of the file when opened from notepad++
RMKF I I 0 ** ƒ ’ *B9 *B9 ’ ’ ÿ # "c *B9 Þ #01 EGP %10 % ƒ 21 $ '10 ' (@P )€ 010 0 0 EGP $21 $
%11 $ (EGP $21 $
%11 $ 7EGP $21 $
%11 $ FEGP $21 $
%11 $ UEGP $21 $
%11 $ ` ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ >01 ÔEGP %10 % ÔƒÖ 21Â
NO. 0 ÄÕ
environment section
000220 ENVIRONMENT DIVISION.
000230 CONFIGURATION SECTION.
000240 SOURCE-COMPUTER. NCR-3000.
000250 OBJECT-COMPUTER. NCR-3000.
000260 INPUT-OUTPUT SECTION.
000270 FILE-CONTROL.
000280 SELECT DQ-HIMVT-A ASSIGN TO DISC
000290 ORGANIZATION INDEXED
000300 ACCESS MODE DYNAMIC
000310 RECORD KEY CLE-A.
I found this file which they call a copy book . don't know how it ois related
000100*
000200**** CINVDAT - ZONE DE TRAVAIL ****
000300*******************************************
000400****
000500*
000600 01 INVDATRAV.
000700 03 INVZON1 PIC 99.
000800 03 INVZON2 PIC 99.
000900 03 INVZON3 PIC 99.
001000 01 INVZONI PIC 99.
001100 01 INVDATE PIC 9(6).
001200 01 INVCAL PIC 9.
001300*
Regards
Upvotes: 1
Views: 2700
Reputation: 23
I'm not sure which system you are using. As my experience in AS400. COBOL data file using EBCDIC format, it cannot be open directly from a text editor. It will only show random texts. You have to convert it in to ASCII before you export. In AS400, I use CHGTOPCD file/member name to a directory and export it out. Then it will show correct texts. Not sure is this information helps you.
Upvotes: 2
Reputation: 13076
You may be able to locate a service which can do the extract for you. If you go this route, ensure that they have all the information you can provide (which must include the data-definitions under the FD) and agree to only pay on verified receipt of the data.
An alternative is to talk to Micro Focus about a short-term license for a COBOL which (again must be guaranteed) can understand the indexed-file format. You then write one simple program per file whose data you need to extract. Advantage here is that what COMP-3 and COMP-6 represent, you don't need to know, as the conversion to a "text" number is done without anyone having to think about it (on the output definition, you remove all references to COMP-anything (also COMP, if there happen to be any)).
A further alternative is to sit down with a hex editor, knowledge of the data, and work out how to abstract the index information away from the data (all the data records are a known, fixed, length, 73 bytes in your example).
Then, with your preferred language which can handle non-delimited-record (fixed length) binary data, and working out what COMP-3, COMP-6, and any other COMP- (or COMP) fields mean. They will likely be packed-decimal, Binary Coded Decimal (BCD) or "some type of binary" given that Standard COBOL has binary fields limited by decimal values (to the size of the PICture clause).
In the first and second alternatives, there is a greater expectation of the reliability of extract. The third may be the "cheapest", but expectations of the time expended to complete are more difficult to stick to.
Of the first two, cost is the likely determinant (assuming you are not going to use COBOL going forward). If you yourself have to write some COBOL programs, don't worry about that, they are very, very simple, and once you have done one, you simply "clone" it.
Upvotes: 6