Reputation: 3781
I have such a code:
corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()
corpus = []
for a in range(0, len(corpus_file), 2):
corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})
params = {}
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params[srcWord][tgtWord] = 1.0
Basically I am trying to create a dictionary of dictionary of float. But I get the following error:
Traceback (most recent call last):
File "initial_guess.py", line 15, in <module>
params[srcWord][tgtWord] = 1.0
KeyError: u'A'
UTF-8 string as key in dictionary causes KeyError
I checked the case above, but it doesn't help.
Basically I don't understand why unicoded string 'A' is not allowed in python to be a key value? Is there any way to fix it?
Upvotes: 0
Views: 3392
Reputation: 10284
Your params
dict is empty.
You can use tree for that:
from collections import defaultdict
def tree():
return defaultdict(tree)
params = tree()
params['any']['keys']['you']['want'] = 1.0
Or a simpler defaultdict
case without tree
:
from collections import defaultdict
params = defaultdict(dict)
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params[srcWord][tgtWord] = 1.0
If you don't want to add anything like that, then just try to add dict to params
on every iteration:
params = {}
for sentencePair in corpus:
for srcWord in sentencePair['src']:
params.setdefault(srcWord, {})
for tgtWord in sentencePair['tgt']:
params[srcWord][tgtWord] = 1.0
Please note, that I've changed the order of for
loops, because you need to know srcWord
first.
Otherwise you need to check key existence too often:
params = {}
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params.setdefault(srcWord, {})[tgtWord] = 1.0
Upvotes: 2
Reputation: 275
You can just use setdefault
:
Replace
params[srcWord][tgtWord] = 1.0
with
params.setdefault(srcWord, {})[tgtWord] = 1.0
Upvotes: 1