minks
minks

Reputation: 3039

Why do I receive this error on parsing?

I am reading in a textfile and converting it into a python dictionary:

The file looks like this with labelword:

20001   World Economies

20002   Politics

20004   Internet Law

20005   Philipines Elections

20006   Israel Politics

20007   Science

This is the code to read the file and create a dictionary:

def get_pair(line):
  key, sep, value = line.strip().partition("\t")
  return int(key), value


with open("mapped.txt") as fd:    
           d = dict(get_pair(line) for line in fd)
print(d)

I receive {} when I print the contents of d. Additionally, I receive this error:

Traceback (most recent call last):
  File "predicter.py", line 23, in <module>
    d = dict(get_pair(line) for line in fd)
  File "predicter.py", line 23, in <genexpr>
    d = dict(get_pair(line) for line in fd)
  File "predicter.py", line 19, in get_pair
    return int(key), value
ValueError: invalid literal for int() with base 10: ''

What does this mean? I do have content inside the file, I am not sure why is it not being read.

Upvotes: 1

Views: 33

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1125248

It means key is empty, which in turn means you have a line with a \t tab at the start or an empty line:

>>> '\tScience'.partition('\t')
>>> ''.partition('\t')
('', '', '')

My guess is that it is the latter; you can skip either such lines in your generator expression:

d = dict(get_pair(line) for line in fd if '\t' in line.strip())

Because line.strip() returns the lines without leading and trailing whitespace, empty lines or lines with only a tab at the start result in a string without a tab in it altogether. This won't handle all cases, but you could also strip the value passed to get_pair():

d = dict(get_pair(line.strip()) for line in fd if '\t' in line.strip())

Upvotes: 3

Related Questions