Python: dynamic list parsing and processing

Question

I have popened a process which is producing a list of dictionaries, something like:

[{'foo': '1'},{'bar':2},...]

The list takes a long time to create and could be many gigabytes, so I don't want to reconstitute it in memory and then iterate over it.

How can I parse the partially completed list such that I can process each dictionary as it is received?

Alex Martelli · Accepted Answer

The Python tokenizer is available as part of the Python standard library, module tokenize. It relies for its input on receiving at the start a readline function (which must supply to it a "line" of input), so it can operate incrementally -- if there are no newlines in your input, you can simulate that as long as you can identify spots where adding a newline is innocuous (not breaking up a token -- thanks to the starting [ everything will be one "logical" line anyway). The only tokens that will require care to avoid being broken will be quoted strings. I'm not pursuing this in depth at this time since if you actually have newlines in your input you won't need to worry.

From the stream of tokens you can reconstruct the string representing each dict in the list (from an opening brace token, to the balancing closed bracket), and use ast.literal_eval to get the corresponding Python dict.

So, do you have newlines in your input? if so, then the whole task should be very easy.

Python: dynamic list parsing and processing

Answers (2)

Related Questions