Joseph Karl
Joseph Karl

Reputation: 55

Is it possible to read a file into COBOL with a variable length data record?

After many years away from programming, I've decided to take it up again for fun, and finding myself enjoying it quite a lot. In the process of finding things to code, I've found this data which is openly available from Network Rail in the United Kingdom.

Among other things, you can obtain schedule data, which is a list of all train, bus and ferry journeys.

A schedule record for a train journey might look like this:

BSNY819581902281902280001000 PEE5A99    122112002 EMU390 125                  
BX         VTY                                                                  
LOMNCRPIC 2131 00008  FL     TB                                                 
LIARDWCKJ           2133 00000000                                               
LISLDLJN            2134H00000000   SL                   H                      
LIHTNOJN            2138 00000000   FL                  1H                      
LISTKP              2140H000000002  SL                                          
LISTKPE1            2141H00000000                                               
LIADSWDRD           2142H00000000                                               
LICHDH              2143 000000002                                              
LIWLMSL             2146 000000004                      1                       
LIALDEDGE           2148 00000000                     1 3                       
LISBCH              2200 000000001  FL                                          
LICREWSBG           2203 00000000                                               
LICREWUML           2204 00000000                                               
LICREWE             2206 000000001  FL                   H                      
LICREWBHJ           2207H00000000                       1                       
LIMADELEY           2212 00000000   FL FL               5H                      
LINTNB              2222H00000000   FL FL                                       
LISTAFFDJ           2226H00000000   SL                                          
LISTAFFRD 2228H2231H     000000004  SL SL A C                                   
LISTAFTVJ           2233 00000000                                               
LIPNKRDG            2236H00000000                     1 1                       
LIBSBYJN            2244 00000000                                               
LIPBLJWM            2248 00000000                                               
LIDRLSTNJ           2251H00000000                                               
LIBSCTSTA           2252H00000000                     1                         
LIPRYBRNJ           2257 00000000                       7                       
LIASTON             2306H000000002                                              
LISTECHFD           2311H00000000                                               
LIBHAMINT           2315 000000004                                              
LIBKSWELL           2318H00000000                       1H                      
LICOVNTRY           2324 000000001                    2 3                       
LIRUGBTVJ           2336 00000000   UNL                 3                       
LIRUGBY             2340 000000005  UNLUNL              1H                      
LIHMTNJ             2343 00000000                       1H                      
LIDVNTYNJ           2346H00000000                                               
LILNGBKBY           2351 00000000                     1 1                       
LINMPTN             0001H000000001                      6                       
LIHANSLPJ           0016 00000000   SL                                          
LIMKNSCEN           0021 000000001  SL SL                                       
LIBLTCHLY           0023 000000004  SL SL             5 1H                      
LILEDBRNJ           0036 00000000   SL SL               2                       
LITRING             0042 000000002  SL SL               2H                      
LIBONENDJ           0048H00000000   SL SL               1H                      
LIWATFDJ            0056 000000009  SL SL               1H                      
LIHROW              0101H000000006  SL SL               3                       
LIWMBY              0107H000000006  CL                                          
LTWMBYICD 0117H0000      TF 

tl;dr the first two lines describe what type of train is running, when it runs, how fast, etc. The other lines describe the points the train will pass, and what time they are expected to do so. The main takeaway is that each record has a different length depending on the journey.

When I saw this, I thought "This would be a great thing to try and mess around with in COBOL." I went to polytech and learned PASCAL and COBOL, but only had to deal with files with a consistent length and consistent data, not something like this.

I spent a couple of hours trying to find some sort of answer to this on Google, but nothing really showed, hence my asking.

Just for reference, I have managed to do this in GW-BASIC, and could do it in elementary Python if needed, but COBOL, being what it is, is a whole different kettle of fish.

Is it possible to read something like this in to COBOL without having to resort to witchcraft, or is it just in the "too hard" basket? I'm only doing this for fun, so it's really no big deal.

Any responses or feedback would be most welcome.

Many thanks,

Joseph.

Upvotes: 1

Views: 2028

Answers (4)

COBOL input files can be VARIABLE or FIXED. If it is variable, the first positions will be the number of cols of the row.

Here you have the information at IBM official webpage. Processing Files with Variable Length Records

Upvotes: 1

James
James

Reputation: 331

To expand on @Simon Sobisch's answer.

Looking at the data, and trying to work it out, I can see these things.

Top two lines as you say are the type of train and that.

Then you have a line starting LO which must be the start of the journey. The next 7 characters would be the station, with MNCRPIC being presumably "Manchester Piccadily". Then there's a space, then four digits which would be a time.

Then you have a load of lines starting LI which are intermediate points. Some of those have a "H" after the time, others don't. This will be a problem if you're going to do UNSTRING DELIMITED BY SPACE. I'm going to presume that H means Halt.

LISTAFFRD 2228H2231H     000000004  SL SL A C

is an odd looking line.

At the end we have LT which is the end of the journey, arriving at WMBYICD at 0117.

   01 TRAIN-SCHEDULE.
       03 RECORD-TYPE PIC XX.
       88 JOURNEY-START VALUE 'LO'.
       88 JOURNEY-INTERMEDIATE VALUE 'LI'.
       88 JOURNEY-TERMINATE VALUE 'LT'.
       03 TRAIN-STATION PIC X(7).
       03 FILLER PIC X(11).
       03 TRAIN-TIME.
         05 TRAIN-TIME-HH PIC 99.
         05 TRAIN-TIME-MM PIC 99.
       03 TRAIN-HALT-FLAG PIC X.
       88 TRAIN-STOPS-HERE VALUE 'H'.

And so on.

Upvotes: 1

Simon Sobisch
Simon Sobisch

Reputation: 7287

Actually (after reformatting the question) I think COBOL is perfect for this job as the data is fixed-length (may came out of COBOL, too...)

  • define the file (line-sequential may not even be needed if it contains, like your post even trailing spaces; but as this may change, line-sequential would be fine)
  • OPEN INPUT file, READ until end of file
  • put the complete record read into a local record with defined sub-fields and just access the data from the sub-fields, after validating (the file may be broken for many reasons)

Depending on the amount of lines in that data you may either process the records direct, move them to a table (have a look at OCCURS), or WRITE them to another file (likely and INDEXED with multiple KEY definitions)

Upvotes: 2

Bruce Martin
Bruce Martin

Reputation: 10543

Yes it is possible. For the file use Line Sequential

The file definition

select lineseq assign to "lineseq.dat"
     organization is line sequential.

To split the lines up use UNSTRING. i.e.

UNSTRING in-line
   DELIMITED BY SPACES
   into  item-1, item-2, item-3
END-UNSTRING

It is probably easier to do in languages like python

Upvotes: 2

Related Questions