Reputation: 85
sentence="one fish two fish red fish blue fish one
red two blue"
sentence='start '+sentence+' end'
word_list=sentence.split(' ')
d={}
for i in range(len(word_list)-1):
d[word_list[i]]=word_list[i+1]
print word_list
print d
Thus, i get the word_list:
['start', 'one', 'fish', 'two', 'fish', 'red',\
'fish', 'blue', 'fish', 'one', 'red', 'two',\
'blue', 'end']
and the d:
{'blue': 'end', 'fish': 'one', 'two': 'blue',\
'one': 'red', 'start': 'one', 'red': 'two'}
But I need a dict with values looked like lists of every possible word followed after the key-word. For example, word 'fish' is followed by 4 words, so I need:
'fish':['two', 'red', 'blue', 'one']
'blue' is followed by 'fish' and 'end'
'blue':['one', 'end']
etc.
Please, any ideas?
The task is the first step to generation random sentence.
Thanks))
Upvotes: 2
Views: 116
Reputation: 23556
you may try this:
from collections import defaultdict
sentence="one fish two fish red fish blue fish one red two blue"
word_list = sentence.split()
d = defaultdict(list)
for a, b in zip( word_list, word_list[1:]) :
d[a].append(b)
print d
it gives:
{
"blue": [ "fish" ],
"fish": [ "two", "red", "blue", "one" ],
"two": [ "fish", "blue" ],
"red": [ "fish", "two" ],
"one": [ "fish", "red" ]
}
and you don't need to add start
and end
to avoid accessing elements beyond the list size.
Upvotes: 4