sandy
sandy

Reputation: 347

Reading COBOL code with .NET to generate a call graph

I am working on a project to automate COBOL to generate a class diagram. I am developing using a .NET console application. I need help tracking down the procedure name where the perform statement in used in the below example.

    **Z-POST-COPYRIGHT.
       move 0 to RETURN-CODE
       perform Z-WRITE-FILE**

How do I track the procedure name 'Z-Post-COPYRIGHT' where the procedure 'Z-write-file' is called? The only idea I could think of in terms of COBOL is through indentation as the procedure names are always indented. Ideally in the database, the code should track the procedure name after the word 'perform' and procedure under which it is called (in this case it is Z-POST-COPYRIGHT).

Upvotes: 0

Views: 525

Answers (1)

Simon Sobisch
Simon Sobisch

Reputation: 7297

I assume you want to do this "on your own" without external tools (a faster approach can be found at the end).

You first have to "know" your source:

  • which compiler was it compiled with (get a manual for this compiler)
  • which options were used

Then you have to preparse the source:

  • include copybooks (doing the given REPLACING rules if any)
  • if the source is in free-form reference format: concatenate contents of last line and current line if you find a - in column 7
  • check for REPLACE and change the result accordingly
  • remove all comments (maybe only * and \ in column 7 in fixed-form reference format or similar (extensions like "variable" format / "terminal" format", ... exist, maybe only inline comments - when in free-form reference-format, otherwise maybe inline comments *> or compiler specific extensions like |) - depending on the further re-engineering you want to do it could be a good idea to extract them and store them at least with a line number reference

The you finally can track the procedure name with the following rule:

  • go backwards to the last separator period (there are more rules but the rule "at least one line break, another period, a space a comma or a semicolon" [I've never seen the last two in real code but it is possible" should be enough)
  • check if there is only one word between this separator period and the next
  • if this word is no reserved COBOL word (this depends on your compiler) it is very likely a procedure name

Start from here and check the output, then fine grade the rule with actual false positives or missing entries.

If you want to do more than only extract the procedure-names for PERFORM and GO TO (you should at least check the sources for PERFROM ... THRU) then this can get to a lot of work...


Faster approach with external tools:

  • run a COBOL compiler on the complete sources and tell it to do the preparsing only - this way you have the big second point solved already
  • if you have the option: tell the compiler or an external tool to create a symbol table / cross reference - this will tell you in which line a procedure is and its name (you can simply find the correct procedure by comparing the line)

Just a note: You may want to check GnuCOBOL (formerly OpenCOBOL) for the preparsing and/or generation of symbol tables/cross-reference and/or printcbl for a completely external tool doing preparsing and/or cobxref for a complete cross reference generation.

Upvotes: 1

Related Questions