Reputation: 474
I've made a parser to extract questions from past exam papers and tabulate how often particular questions/topics arise through the years. It stores the question/topic as a dictionary and the date as a list and is supposed to combine both as follows:
questions = {'Question1':['April 2011', 'May 2016'], 'Question2': ['June 2013']}
Problem is, I'm unable to update the list of dates within the dictionary. A snippet of my code is as follows:
def extract_topics_dates(file):
corpus = ''
topics = []
questions = {}
year = []
pdf_reader = PyPDF2.PdfFileReader(open(file, 'rb'))
for page in pdf_reader.pages:
#For each page, get corpus of text.
for line in page.extractText().splitlines():
corpus = corpus + line
#For each page, extract topics.
for i in [phrase for phrase in map(str.strip, re.split('\d+\s\s', corpus)) if phrase]:
topics.append(extract_topic(i))
topics = [x for x in topics if x is not None]
#For each page, extract date.
year = set([x for x in year if x is not None])
year.add(get_date(page))
#For each page, now combine the topic + date.
for i in topics:
questions[i].add(year)
return questions
Everything in this function works as intended, except the last questions[i].add(year)
which returns a KeyError. Where am I going wrong?
Upvotes: 1
Views: 52
Reputation: 14660
You should create a list for your key in the dictionary before adding anything to it. Please change that for loop
to the following:
for i in topics:
if i not in questions:
questions[i] = list()
questions[i].append(year)
Or, as suggested by @Jon Clements:
for topic in topics:
questions.setdefault(topic, []).append(year)
Upvotes: 1