yusuf
yusuf

Reputation: 3781

Unicoded string key error in python dict

I have such a code:

corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()

corpus = []
for a in range(0, len(corpus_file), 2):
     corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})

params = {}

for sentencePair in corpus:
     for tgtWord in sentencePair['tgt']:
          for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

Basically I am trying to create a dictionary of dictionary of float. But I get the following error:

Traceback (most recent call last):
  File "initial_guess.py", line 15, in <module>
    params[srcWord][tgtWord] = 1.0
KeyError: u'A'

UTF-8 string as key in dictionary causes KeyError

I checked the case above, but it doesn't help.

Basically I don't understand why unicoded string 'A' is not allowed in python to be a key value? Is there any way to fix it?

Upvotes: 0

Views: 3392

Answers (2)

skovorodkin
skovorodkin

Reputation: 10284

Your params dict is empty.

You can use tree for that:

from collections import defaultdict

def tree():
    return defaultdict(tree)

params = tree()
params['any']['keys']['you']['want'] = 1.0

Or a simpler defaultdict case without tree:

from collections import defaultdict

params = defaultdict(dict)

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

If you don't want to add anything like that, then just try to add dict to params on every iteration:

params = {}

for sentencePair in corpus:
    for srcWord in sentencePair['src']:
        params.setdefault(srcWord, {})
        for tgtWord in sentencePair['tgt']:  
               params[srcWord][tgtWord] = 1.0

Please note, that I've changed the order of for loops, because you need to know srcWord first.

Otherwise you need to check key existence too often:

params = {}

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
            params.setdefault(srcWord, {})[tgtWord] = 1.0

Upvotes: 2

Ailurus
Ailurus

Reputation: 275

You can just use setdefault:

Replace

params[srcWord][tgtWord] = 1.0

with

params.setdefault(srcWord, {})[tgtWord] = 1.0

Upvotes: 1

Related Questions