python: dictionary of words and wordforms

Question

I have the following problem: I created a dictionary (german) with words and their corresponding lemma. exemple: "Lagerbestände", "Lager-bestand"; "Wohnhäuser", "Wohn-haus"; "Bahnhof", "Bahn-hof"

I now have a text and I want to check for all word their lemmata. It can happen that it appears a word which is not in the dict, such as "Restbestände". But the lemma of "bestände", we already know it. So I want to take the first part of the word which is unknown in dicti and add this to the lemmatized second part and print this out (or return it). Example: "Restbestände" --> "Rest-bestand". ("bestand" is taken from the lemma of "Lagerbestände")

I coded the following:

for limit in range(1, len(Word)): 
    for k, v in dicti.iteritems():
        if re.search('[\w]*'+Word[limit:], k, re.IGNORECASE) != None:
            if '-' in v:
                tmp = v.find('-')
                end = v[tmp:]
                end = re.sub(ur'[-]',"", end)
                Word = Word[:limit] + '-' + end `

But I got 2 problems:

At the end of the words, it is printed out every time " ". How can I avoid this?
The second part of the word is sometimes not correct - there must be a logical error.

However; how would you solve this?

python: dictionary of words and wordforms

Answers (1)

Related Questions