treuss
treuss

Reputation: 2388

ast.literal_eval not working as part of list comprehension (when reading a file)

I am trying to parse a file which has pairs of lines, each of them representing a list of integers or other lists. Example data from the file:

[[[6,10],[4,3,[4]]]]
[[4,3,[[4,9,9,7]]]]

[[6,[[3,10],[],[],2,10],[[6,8,4,2]]],[]]
[[6,[],[2,[6,2],5]]]

I am trying to read the file into a list of tuples of data-structures (nested lists) with the following statement:

with open("filename","r") as fp:
    pairs = [tuple(ast.literal_eval(l.strip()) for l in lines.split("\n")) for lines in fp.read().split("\n\n")]

This failed with below stacktrace, leading me to believe that the data was somewhere corrupt (unmatched brackets or something similar):

Traceback (most recent call last):
  File "program.py", line 5, in <module>
    pairs = [tuple(ast.literal_eval(l.strip()) for l in lines.split("\n")) for lines in fp.read().split("\n\n")]
  File "program.py", line 5, in <listcomp> 
    pairs = [tuple(ast.literal_eval(l.strip()) for l in lines.split("\n")) for lines in fp.read().split("\n\n")]
  File "program.py", line 5, in <genexpr>  
    pairs = [tuple(ast.literal_eval(l.strip()) for l in lines.split("\n")) for lines in fp.read().split("\n\n")]
  File "C:\Python39\lib\ast.py", line 62, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "C:\Python39\lib\ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 0

SyntaxError: unexpected EOF while parsing

So I cut down the program into manual loops and the problem was not reproducible any more. So the below code, which first reads into a list of tuples of strings and then evaluating the strings with ast.literal_eval works fine. The above "doing-it-all-at-once" still fails with the same error.

# This works:
with open("filename","r") as fp:
    stringpairs = [tuple(l.strip() for l in lines.split("\n")) for lines in fp.read().split("\n\n")]
pairs = [tuple(ast.literal_eval(pair[i]) for i in range(2)) for pair in stringpairs]

# This still doesn't work:
with open("filename","r") as fp:
    pairs = [tuple(ast.literal_eval(l.strip()) for l in lines.split("\n")) for lines in fp.read().split("\n\n")]

Upvotes: 0

Views: 1113

Answers (3)

Nice Zombies
Nice Zombies

Reputation: 1087

In that case, just strip the begin and end, problem solved!

import ast
with open("file.txt", "w") as file:
    file.write("[[[6,10],[4,3,[4]]]]\n")
    file.write("[[4,3,[[4,9,9,7]]]]\n")
    file.write("\n")
    file.write("[[6,[[3,10],[],[],2,10],[[6,8,4,2]]],[]]\n")
    file.write("[[6,[],[2,[6,2],5]]]\n") # evil newline
pairs = []
with open("file.txt","r") as file:
    pairs = [tuple(ast.literal_eval(line.strip()) for line in multiline.split("\n")) for multiline in file.read().strip().split("\n\n")]
    for pair in pairs:
        print(pair)

Output:

([[[6, 10], [4, 3, [4]]]], [[4, 3, [[4, 9, 9, 7]]]])
([[6, [[3, 10], [], [], 2, 10], [[6, 8, 4, 2]]], []], [[6, [], [2, [6, 2], 5]]])

Upvotes: 0

Barmar
Barmar

Reputation: 781004

The problem is that fp.read().split("\n\n") is leaving the final newline at the end of the last pair of lines.

Then when you do lines.split('\n') you get 3 lines in the last group, not just 2; the last line in this group is empty, and ast.literal_eval('') gets an error.

So strip this newline off before calling lines.split('\n').

pairs = [tuple(ast.literal_eval(l.strip()) 
         for l in lines.strip().split("\n")) 
         for lines in fp.read().split("\n\n")]

Upvotes: 1

treuss
treuss

Reputation: 2388

Ok, I found the reason. There is no empty line at the end of file (a single linefeed after the last line, so the outer split on \n\n splits it into correct segments but while the first segment (and all but the last) looks like line1\nline2, the last segment looks like line42\nline43\n. Now the inner split on \n splits this into three parts, line42, line42 and and empty string and as ast.literal_eval is called on all items of the split, it is also called on the empty string which fails.

The second solution avoids it as it does not iterate over all elements of the tuple but explicitly only over the first two.

Upvotes: 0

Related Questions