Reputation: 183
I am trying to write a code, which takes a sentence:
dimension implies direction implies measurement implies the more and the less
and converts it into a dictionary, where the words = key and the value = previous words, but for the first word there is NO value.
It should essentially be:
{'and' : 'more'
'dimension' : ''
'direction' : 'implies'
'implies' : 'dimension', 'direction', 'measurement'
'less' : 'the'
'measurement' :'implies'
'more' : 'the'
'the' : 'and', 'implies'}
I wrote:
def get_previous_words_dict(text):
words_list = text.split()
sentence_dict = {}
for i in range(0,len(words_list)):
sentence_dict[words_list[i]] = words_list[i-1]
BUT it doesn't add the value to the existing value of a key, but rather replaces it, so instead of getting 3 different values for 'implies'
I am only getting 1 value.
Also, instead of assigning NO value to the word dimension, it is assigning it less (since -1).
Upvotes: 5
Views: 8164
Reputation: 55479
Here's how to do it without a defaultdict
:
text = 'dimension implies direction implies measurement implies the more and the less'
sentence_dict = {}
prev = ''
for word in text.split():
if word not in sentence_dict:
sentence_dict[word] = []
sentence_dict[word].append(prev)
prev = word
print(sentence_dict)
output
{'and': ['more'], 'direction': ['implies'], 'implies': ['dimension', 'direction', 'measurement'], 'less': ['the'], 'measurement': ['implies'], 'the': ['implies', 'and'], 'dimension': [''], 'more': ['the']}
Here's a more compact way, using setdefault:
text = 'dimension implies direction implies measurement implies the more and the less'
sentence_dict = {}
prev = ''
for word in text.split():
sentence_dict.setdefault(word, []).append(prev)
prev = word
print(sentence_dict)
The previous version is probably a little easier to read.
Upvotes: 7
Reputation: 4449
If you're not allowed to import anything then a nifty reduce
operation together with slicing
and zip
(all of these are Python built-ins, requiring no imports) might be a very compact way to do it:
EDIT
After having had it pointed out to me that I had misunderstood the problem, fixed it by changing the zip()
statement.
# the string - split it immediately into a list of words
# (some words deleted to make it smaller)
words = "dimension implies direction implies the more and the less".split()
# There is a **lot** going on in this line of code, explanation below.
result = reduce(lambda acc, kv: acc.setdefault(kv[0], []).append(kv[1]) or acc,
zip(words[1:], words[:-1]), {})
# this was the previous - incorrect - zip()
# zip(words[1::2], words[0::2]), {})
And outputting the result (also edited)
print result
{'and': ['more'], 'direction': ['implies'], 'implies': ['dimension',
'direction', 'measurement'], 'less': ['the'], 'measurement':['implies'],
'the': ['implies', 'and'], 'more': ['the']}
For completeness' sake, the old, erroneous, result:
print result
{'the': ['and'], 'implies': ['dimension', 'direction', 'measurement'], 'more': ['the']}
A bit of explanation
After having split the string into a list of words, we can index the individual words as words[i]
.
edited By the problem statement, the keys of the resulting dict are the words following a word, the value being the first word. So we must transform the list of words into a list of combinations of each word with the next word. So the list of key
's will be the list [words[1],words[2],words[3],....] and the values
that go with those are: [words[0], words[1], words[2], ..., words[n-1]].
Using Python slicing
: keys = words[1:]
and values = words[:-1]
Now we need to create a dict
of those keys and values, aggregating values into a list
, if the same key occurs multiple times.
A dict
has a method .setdefault(key, value)
which will initialize key
's value to value
if key
is not in the dict
yet, otherwise returns the value as it currently is. By default-intializing all values to the empty list
([]
) we can blindly call .append(...)
on it. That's what this part of the code does:
acc.setdefault(key, []).append( value )
Then there is reduce
. A reduce operation reduces (...) a list of values into one. In this case we will reduce a list of (key, value)
tuples into a dict
where we accumulated all values to their respective key.
reduce
takes a callback reduction function and an initial element. The initial element here is the empty dict {}
- we'll be filling that in as we go along.
The callback reduction function is called repeatedly with two arguments, the accumulator and the next element to add to the accumulation. The function should return the new accumulator.
In this code, the reduction step is basically the addition of the element's value to the list of values for the element's key. (See above - that's what the .setdefault().append()
does).
All we need is to get a list of (key, value)
tuples that we need to process. That's where the built-in zip
appears. zip
takes two lists and returns a list of tuples of corresponding elements.
Thus:
zip(words[1:], words[:-1])
produces exactly what we want: the list of all (key, value)
tuples.
Finally, because the reducing function needs to return the new accumulator, we have to play a trick. list.append(...)
returns None
, even though the actual dict has been modified. Thus we cannot return that value as the next accumulator. So we add the construction or acc
after that.
Because the left-hand side of the logical or
always evaluates to None
, which is logically False
in Python, the right-hand side is always 'evaluated' - in this case the (modified) dict itself. The net result of the or
evaluates therefore to the modified dict itself, which is exactly what we need to return.
Upvotes: 0
Reputation: 1625
Just split the string in to a list and create another list by offsetting with a prefix empty string, then zip it and create the dictionary by iterating it, PS - use defaultdict initialized with list instead of dictionary because of the possiblity of multiple values for a single key.
inp = "dimension implies direction implies measurement implies the more and the less"
l1 = inp.split()
l2 = [""]+l1;
zipped = zip(l1,l2)
from collections import defaultdict
d = defaultdict(list)
for k, v in zipped:
d[k].append(v)
print d
If you don't want to import any thing initialize the dict to consist of empty list then use the same logic
inp = "dimension implies direction implies measurement implies the more and the less"
l1 = inp.split()
l2 = [""]+l1;
zipped = zip(l1, l2)
d = {x: [] for x in l1}
for k, v in zipped:
d[k].append(v)
print d
Upvotes: 2