Why can't we use a CFG to scan/tokenize?

Question

I have a practice exam for my compilers course with the following questions that I can't get:

Why can't we use a context-free grammer (CFG) to scan/tokenize?
Why don't we use a deterministic finite automaton (DFA) for parsing?

Does anyone have any ideas?

templatetypedef · Accepted Answer

Both of these statements are incorrect.

You absolutely can use a CFG to do scanning and tokenization. In fact, every regular language is also context-free, so you could rewrite your scanner to scan using a context-free grammar instead of a regular expression. The main reason for not doing this is that it's usually overkill; tokens rarely have a complex structure to them, and a regular expression usually works just fine. However, you can think of instances where you might want to use a CFG. For example, in C++, the treatment of close angle braces in templates often requires special-casing by the compiler. For example, vector> should tokenize as vector < vector < int > >, though using a standard set of regular expressions the two closing braces would be scanned as the >> token. Using a context-free grammar to do scanning could alleviate this by having more context.

Also, you absolutely can use regular expressions for parsing, provided that your language is sufficiently simple. Most languages are too complex to be encoded with a regular expression (for example, anything involving nested parentheses cannot be parsed with regular expressions), so we tend to use CFGs instead, but there are languages that could be parsed with regular expressions. For example, a description of a DFA as a table like this one could definitely be parsed by a regular expression:

     0    1
 q0  q1   q0
 q1  q0   q1

Most real programming languages don't have a regular structure, though, and so in practice context-free grammars are used instead.

Hope this helps!

Why can't we use a CFG to scan/tokenize?

Answers (1)

Related Questions

Why can&#39;t we use a CFG to scan/tokenize?

Answers (1)

Related Questions

Why can't we use a CFG to scan/tokenize?