Nume
Nume

Reputation: 183

Python - sentence to a dictionary

I am trying to write a code, which takes a sentence:

dimension implies direction implies measurement implies the more and the less

and converts it into a dictionary, where the words = key and the value = previous words, but for the first word there is NO value.

It should essentially be:

{'and' : 'more'

'dimension' : ''

'direction' : 'implies'

'implies' : 'dimension', 'direction', 'measurement'

'less' : 'the'

'measurement' :'implies'

'more' : 'the'

'the' : 'and', 'implies'}

I wrote:

def get_previous_words_dict(text):
    words_list = text.split()
    sentence_dict = {}
    for i in range(0,len(words_list)):
        sentence_dict[words_list[i]] = words_list[i-1]

BUT it doesn't add the value to the existing value of a key, but rather replaces it, so instead of getting 3 different values for 'implies' I am only getting 1 value.

Also, instead of assigning NO value to the word dimension, it is assigning it less (since -1).

Upvotes: 5

Views: 8164

Answers (3)

PM 2Ring
PM 2Ring

Reputation: 55479

Here's how to do it without a defaultdict:

text = 'dimension implies direction implies measurement implies the more and the less'
sentence_dict = {}
prev = ''
for word in text.split():
    if word not in sentence_dict:
        sentence_dict[word] = []
    sentence_dict[word].append(prev)
    prev = word

print(sentence_dict)

output

{'and': ['more'], 'direction': ['implies'], 'implies': ['dimension', 'direction', 'measurement'], 'less': ['the'], 'measurement': ['implies'], 'the': ['implies', 'and'], 'dimension': [''], 'more': ['the']}

Here's a more compact way, using setdefault:

text = 'dimension implies direction implies measurement implies the more and the less'

sentence_dict = {}
prev = ''
for word in text.split():    
    sentence_dict.setdefault(word, []).append(prev)
    prev = word

print(sentence_dict)

The previous version is probably a little easier to read.

Upvotes: 7

emvee
emvee

Reputation: 4449

If you're not allowed to import anything then a nifty reduce operation together with slicing and zip (all of these are Python built-ins, requiring no imports) might be a very compact way to do it:

EDIT After having had it pointed out to me that I had misunderstood the problem, fixed it by changing the zip() statement.

# the string - split it immediately into a list of words
# (some words deleted to make it smaller)
words = "dimension implies direction implies the more and the less".split()

# There is a **lot** going on in this line of code, explanation below.
result = reduce(lambda acc, kv: acc.setdefault(kv[0], []).append(kv[1]) or acc,
                zip(words[1:], words[:-1]), {})
# this was the previous - incorrect - zip()
#                zip(words[1::2], words[0::2]), {})

And outputting the result (also edited)

print result
{'and': ['more'], 'direction': ['implies'], 'implies': ['dimension',
 'direction', 'measurement'], 'less': ['the'], 'measurement':['implies'],
 'the': ['implies', 'and'], 'more': ['the']}

For completeness' sake, the old, erroneous, result:

print result
{'the': ['and'], 'implies': ['dimension', 'direction', 'measurement'], 'more': ['the']}

A bit of explanation

After having split the string into a list of words, we can index the individual words as words[i].

edited By the problem statement, the keys of the resulting dict are the words following a word, the value being the first word. So we must transform the list of words into a list of combinations of each word with the next word. So the list of key's will be the list [words[1],words[2],words[3],....] and the values that go with those are: [words[0], words[1], words[2], ..., words[n-1]].

Using Python slicing: keys = words[1:] and values = words[:-1]

Now we need to create a dict of those keys and values, aggregating values into a list, if the same key occurs multiple times.

A dict has a method .setdefault(key, value) which will initialize key's value to value if key is not in the dict yet, otherwise returns the value as it currently is. By default-intializing all values to the empty list ([]) we can blindly call .append(...) on it. That's what this part of the code does:

acc.setdefault(key, []).append( value )

Then there is reduce. A reduce operation reduces (...) a list of values into one. In this case we will reduce a list of (key, value) tuples into a dict where we accumulated all values to their respective key.

reduce takes a callback reduction function and an initial element. The initial element here is the empty dict {} - we'll be filling that in as we go along.

The callback reduction function is called repeatedly with two arguments, the accumulator and the next element to add to the accumulation. The function should return the new accumulator.

In this code, the reduction step is basically the addition of the element's value to the list of values for the element's key. (See above - that's what the .setdefault().append() does).

All we need is to get a list of (key, value) tuples that we need to process. That's where the built-in zip appears. zip takes two lists and returns a list of tuples of corresponding elements.

Thus:

zip(words[1:], words[:-1])

produces exactly what we want: the list of all (key, value) tuples.

Finally, because the reducing function needs to return the new accumulator, we have to play a trick. list.append(...) returns None, even though the actual dict has been modified. Thus we cannot return that value as the next accumulator. So we add the construction or acc after that.

Because the left-hand side of the logical or always evaluates to None, which is logically False in Python, the right-hand side is always 'evaluated' - in this case the (modified) dict itself. The net result of the or evaluates therefore to the modified dict itself, which is exactly what we need to return.

Upvotes: 0

Kavin Eswaramoorthy
Kavin Eswaramoorthy

Reputation: 1625

Just split the string in to a list and create another list by offsetting with a prefix empty string, then zip it and create the dictionary by iterating it, PS - use defaultdict initialized with list instead of dictionary because of the possiblity of multiple values for a single key.

inp = "dimension implies direction implies measurement implies the more and the less"
l1 = inp.split()
l2 = [""]+l1;
zipped = zip(l1,l2)
from collections import defaultdict
d = defaultdict(list)
for k, v in zipped: 
    d[k].append(v)
print d

If you don't want to import any thing initialize the dict to consist of empty list then use the same logic

inp = "dimension implies direction implies measurement implies the more and the less"
l1 = inp.split()
l2 = [""]+l1;
zipped = zip(l1, l2)
d = {x: [] for x in l1}
for k, v in zipped: 
    d[k].append(v)
print d

Upvotes: 2

Related Questions