ddd
ddd

Reputation: 5029

How to implement an algorithm to check Python'code's indentation

I was recently interviewed with a local tech company for Java developer role and was asked to write a Java program to check the correctness of indentation in python code (at least find the first indentation error).

It may be easier to deal with def, while and for. However it is tricky to handle things like if...elif...else. There are different situations, for example, just if no else or elif; nested ifs. If it is paired, maybe I can use a stack, but you don't know whether they are paired or not.

I could really use some advice here.

Upvotes: 1

Views: 1412

Answers (1)

PSub
PSub

Reputation: 266

The way I see it, there's two distinct ways that you can solve this question.

  1. Convert the python code to some intermediary format which resembles a Context-Free Grammar (CFG) and then perform the error checking on the CFG.
  2. Case checking.

The former is a little more elegant and revolves around compiler theory, or at the very minimum, Automata theory. While this process is guaranteed to work and establishes some sort of language that people can refer to, it may be extremely tedious for something that is time sensitive. It's an elaborate fix to what may seemingly be, a simple task. The advantage of this technique would be that, if we're looking for a "hacky" solution, such as find the ":" and then checking if the next line is indented or not, this may not work for certain inline commands in python that use ":". An example for this could be print("Enter your name:"), or a subprocess.Popen command. This case will ensure that errors like this are avoided.

The latter on the other hand is extremely difficult to keep track of as a programmer and debugging it would be fairly difficult. I say this in some relativistic measure, there would be several cases to check, top level statements. So let's use some "good programming practices", since the keywords def if elif else class etc can be stored in a common locationIn order to solve this, we'd declare some variable (let's call it i, and read in the file line by line and checking the first (or zeroth character). If the character is not a space, we will read till the next whitespace and check if that word exists in some Trie of words that define indentation blocks (you could use a hash as well, doesn't make a difference). If it is in one of the blocks, you increment i by 4 and move on to the next line. In the subsequent line, you'd read up to the ith character and ensure they're spaces. Essentially, you'd repeat this process until you find something that contradicts what is being looked for. Now, if something does contradict what we're looking for, we may have to read the previous line. Consider

long_sequence = [1,2,3,4,1,2,3,4,1,2,3,4, 5,6,7]

Now, this would technically throw an error, so we have to consider this case and several cases which are similar to this where there is a line which is longer than a single line but the command is not complete in the one line.

So, as previously mentioned solution 2 would be very tedious and a debugging nightmare, 1 is elegant but really hard to construct. It really does depend on the timeline that you have to construct it.

Upvotes: 1

Related Questions