Pavel Matras
Pavel Matras

Reputation: 349

COBOL reading sequential line file, count characters

in COBOL I am reading from sequential line file. Line by line, to EOF, something like that

           read bank-file  at end
            move 'Y'  to end-of-bank

And lines have variable length from 40 to 80 characters. And I need to know, how many characters are on each line. But line can end with some spaces, which I need count too. So I can't take length of string from variable in program. Is there any return value from READ statement, which returns number of characters from readed line (until, CRLF is reached)?

Upvotes: 1

Views: 3746

Answers (2)

Randy B.
Randy B.

Reputation: 1

Just in case you still don't know how many bytes you have, try this:

Wonderful thing about cobol on unix/linux/pcs is for the most part they do not check the file structure they assume you were bright enough to tell the program what the file was, and in the case of a complicated file such as a an MFCobol B-Tree index embedded in the file, the file header will do the rest.

My first exposure to MFCobol had users ending up with corrupt files all the time and we needed a way to know what was wrong quickly, so I leveraged this fact and basically parsed the files looking for certain features, such as a x'0A' (UNIX) or a CR/LF which would tell us that someone FTP'd a file from PC to LINUX using binary transfer. It did exactly as we had hoped and we eventually released it as an end user utillity.

Based on this, you COULD just tell the file it has 1 byte records and read each byte as a binary sequential. This would let you count the bytes as they go by. Change the file definition to BINARY SEQUENTIAL with record size of pic x(01). Since you state that the record terminator is CR/LF you will need a 2 byte field for pattern recognition, and to reduce the byte count for the delimiters.

SELECT SOME-FILE
    ASSIGN TO "someFile.txt"
    ORGANIZATION IS BINARY SEQUENTIAL.

 DATA DIVISION.
 FILE SECTION.

 FD SOME-FILE
    01 SOME-BYTE PIC X(01).

 WORKING-STORAGE SECTION.
 01 PATTERN-BUFFER.
    05  PB-01  PIC X(01).
    05  PB-02  PIC X(01).
 01  BYTE-COUNT      PIC 9(9) VALUE ZERO.
 01  END-OF-SOME-FILE   PIC X(01) VALUE IS "N"

PROCEDURE DIVISION.
MAIN.
  open SOME-FILE.
  READ SOME-FILE INTO SOME-BYTE
  AT END
     CLOSE SOME-FILE
     DISPLAY  "BYTE-COUNT: 0"
     STOP RUN
  NOT AT END
      MOVE 1 TO BYTE-COUNT
      PERFORM UNTIL END-OF-SOME-FILE="Y"
         READ SOME-FILE       **  (1 byte record)
           AT END MOVE "Y" TO END-OF-SOME-FILE
              DISPLAY BYTE-COUNT
              STOP RUN
           NOT AT END
              ADD 1 to BYTE-COUNT
              MOVE PB-02 to PB-01 
              MOVE SOME-BYTE TO PB-02
              IF PATTERN-BUFFER = x'0D0A'
                 SUBTRACT 2 FROM BYTE-COUNT
              ELSE
                 IF PB-01 = x'00" AND PB-02 < X'20'   <<=== SEE NOTE
                    SUBTRACT 1 FROM BYTE=COUNT
                 END-IF
              END-IF
         END-READ
     END-PERFORM
END-READ

MF COBOL can optionally do two things to LINE SEQUENTIAL files that can mess with your count.

The first is to remove all trailing blanks... but according to the spec this should be fine you want the number of actual stored bytes.

The second is marking off characters that may in certain conditions be misinterpreted. This is especially true of carriage control characters that may look like a binary integer value. If MF Cobol sees a value less than the ascii value of a space, it will place a binary 0 value in a flag byte before it.. This flag byte while taking space in the file is not data, it is a file structure marker and would not normally find itself in your output count, but because we made the file binary sequential, it id not being removed from the read at runtime, and as such if you see a LOW-VALUE or x'00' followed by a character of a value less than x'20"then reduce your output byte count by 1.

Upvotes: 0

MC Emperor
MC Emperor

Reputation: 22977

Edit

As mentioned in the comments, it actually is possible to get the number of characters (bytes) read, indeed with the RECORD VARYING DEPENDING ON clause:

ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.

    SELECT SOME-FILE
        ASSIGN TO "someFile.txt"
        ORGANIZATION IS LINE SEQUENTIAL.

DATA DIVISION.
FILE SECTION.

FD SOME-FILE
    RECORD VARYING 40 TO 80 DEPENDING ON SOME-LINE-LENGTH.

 01 SOME-LINE PIC X(80).

WORKING-STORAGE SECTION.

 77 SOME-LINE-LENGTH PIC 9(3).

Now for each read, the record length is stored into SOME-LINE-LENGTH:

READ SOME-FILE NEXT RECORD
DISPLAY SOME-LINE-LENGTH

I don't know exactly which vendors support it (possibly almost all), but at least it works with ACUCOBOL.


Original post

As far as I know, there is no feedback on the number of bytes read by the execution of the READ statement. Apparently, bytes are instantly stored into a record described by a file descriptor in your FILE SECTION.

However, you can calculate the number of bytes read by counting the number of characters written to the record.
First, initialize the file record to LOW-VALUES. Then read the next record; that will move the number of bytes read to the record. When the number of bytes read is smaller than the record size, the bytes at the end of the record are left unchanged.

MOVE LOW-VALUES TO YOUR-RECORD
READ YOUR-FILE NEXT RECORD
PERFORM VARYING SOME-COUNTER FROM 72 BY -1 UNTIL (SOME-COUNTER < 0)
    IF NOT (YOUR-RECORD(SOME-COUNTER : 1) = LOW-VALUES)
        EXIT PERFORM
    END-IF
END-PERFORM

SOME-COUNTER will contain the line length, assuming no NUL values are present in the file.

I guess this will be time-consuming when the number of lines is large, but at least you got your line lengths.


As Bill Woodger already mentioned, since you didn't provide additional details, I had to make some assumptions.

I'm running MicroFocus ACUCOBOL-GT on Windows 10 myself.

Upvotes: 1

Related Questions