Reputation: 31
I am new to python and attempting to make a markov chain. Other examples show object instance usage and I haven't gone quite that far. I havent done the random selection of the values part yet but basically I am at a loss for my output of this code so far.
filename = open("dr-suess.txt")
def make_list(filename):
"""make file a list and a list of tuple tup_pairs"""
file_string = filename.read() #read whole file
file_list = file_string.split() #split on whitespace (not worrying about
# puncuation right now)
tup_pairs = []
for i in range(len(file_list)-1):
tup_pairs.append((file_list[i], file_list[i+1])) #making my tuple pair list
return tup_pairs, file_list
def mapping(filename):
tup_pairs, file_list = make_list(filename)
dictionary = {}
for pair in tup_pairs:
dictionary[pair] = [] #setting the value of dict to empty list
tup_pairs = set(tup_pairs) #throwing out repeated tuples
for word in file_list:
word_number = file_list.index(word) #index number of iter word
if word_number > 1: #because there is no -2/-1 index
compared_tuple = (file_list[word_number-2], file_list[word_number-1]) #to find
#preceeding pair to compare
for pair in tup_pairs:
if compared_tuple == pair:
dictionary[pair].append(word) #should append the word to my dict value (list)
print dictionary #getting weird results (some words should appear that dont, some
# don't appear that should)
mapping(filename)
output:
Lindsays-MBP:markov lindsayg$ python markov.py
{('a', 'fox?'): [], ('Sam', 'I'): ['am?'], **('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'yo**u']**, ('could', 'you'): ['in', 'with', 'in', 'with'], ('you', 'with'): [], ('box?', 'Would'): [], ('ham?', 'Would'): [], ('I', 'am?'): [], ('you', 'in'): ['a', 'a', 'a', 'a'], ('a', 'house?'): [], ('like', 'green'): ['eggs'], ('like', 'them,'): ['Sam'], ('and', 'ham?'): [], ('Would', 'you'): ['like', 'like'], ('a', 'mouse?'): [], ('them,', 'Sam'): ['I'], ('in', 'a'): ['house?', 'box?'], ('with', 'a'): ['mouse?', 'fox?'], ('house?', 'Would'): [], ('a', 'box?'): [], ('Would', 'you,'): ['could', 'could', 'could', 'could'], ('green', 'eggs'): ['and'], ('you', 'like'): ['green', 'them,'], ('mouse?', 'Would'): [], ('fox?', 'Would'): [], ('eggs', 'and'): ['ham?']}
One example of weird output (there should only be 4 'you' values and there are six):
('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'you']
fyi file text being used:
Would you, could you in a house?
Would you, could you with a mouse?
Would you, could you in a box?
Would you, could you with a fox?
Would you like green eggs and ham?
Would you like them, Sam I am?
Upvotes: 3
Views: 2093
Reputation: 3647
Your problem is the way you find the index of the word: index
gives the first instance. There are 6 'you'
(and 4 'you,'
that are different) and each of them will get the same index word_number = 3
, so they will all be added to the pair ('Would', 'you,')
.
To get the index, you should use the built-in enumerate
:
for word_number, word in enumerate(file_list):
...
Upvotes: 3