Primal Pappachan
Primal Pappachan

Reputation: 26525

Building a lexical Analyzer in Java

I am presently learning Lexical Analysis in Compiler Design. In order to learn how really a lexical analyzer works I am trying to build one myself. I am planning to build it in Java.

The input to the lexical analyzer is a .tex file which is of the following format.

\begin{document}

    \chapter{Introduction}

    \section{Scope}

    Arbitrary text.

    \section{Relevance}

    Arbitrary text.

    \subsection{Advantages}

    Arbitrary text.

    \subsubsection{In Real life}

    \subsection{Disadvantages}

    \end{document}

The output of the lexer should be a table of contents possibly with page numbers in another file.

1. Introduction   1
  1.1 Scope         1 
  1.2 Relevance     2  
    1.2.1 Advantages  2
       1.2.1.1 In Real Life  2
     1.2.2 Disadvantages   3 

I hope that this problem is within the scope of the lexical analysis.

My lexer would read the .tex file and check for '\' and on finding continues reading to check whether it is indeed one of the sectioning commands. A flag variable is set to indicate the type of sectioning. The word in curly braces following the sectioning command is read and written along prefixed with a number (like 1.2.1) depending upon the type and depth.

I hope the above approach would work for building the lexer. How do I go about in adding page numbers to the table of contents if that's possible within the scope of the lexer?

Upvotes: 0

Views: 2748

Answers (2)

user207421
user207421

Reputation: 310860

What you describe is really a lexer plus parser. The job of the lexical analyser here is to return tokens and ignore whitespace. The tokens here are the various keywords introduced by '\', string literals inside '{', '}' and arbitrary text elsewhere. Everything else you dscribed is parsing and tree-building.

Upvotes: 0

corsiKa
corsiKa

Reputation: 82559

You really could add them any way you want. I would recommend storing the contents of your .tex file in your own tree-like or map-like structure, then read in your page numbers file, and apply them appropriately.

A more archaic option would be to write a second parser that parses the output of your first parser and the line numbers file and appends them appropriately.

It really is up to you. Since this is a learning exercise, try to build as if someone else were to use it. How user-friendly is it? Making something only you can use is still good for concept learning, but could lead to messy practices if you ever use it in the real world!

Upvotes: 2

Related Questions