Lindsay Rae
Lindsay Rae

Reputation: 31

Markov chain in Python (beginner)

I am new to python and attempting to make a markov chain. Other examples show object instance usage and I haven't gone quite that far. I havent done the random selection of the values part yet but basically I am at a loss for my output of this code so far.

filename = open("dr-suess.txt")

def make_list(filename):
    """make file a list and a list of tuple tup_pairs"""
    file_string = filename.read()  #read whole file
    file_list = file_string.split()   #split on whitespace (not worrying about 
                                      # puncuation right now)
    tup_pairs = []
    for i in range(len(file_list)-1):  
        tup_pairs.append((file_list[i], file_list[i+1]))  #making my tuple pair list
        return tup_pairs, file_list  

def mapping(filename):
    tup_pairs, file_list = make_list(filename)  
    dictionary = {} 
    for pair in tup_pairs:
        dictionary[pair] = []  #setting the value of dict to empty list
    tup_pairs = set(tup_pairs)   #throwing out repeated tuples 
    for word in file_list:
        word_number = file_list.index(word)  #index number of iter word
        if word_number > 1:   #because there is no -2/-1 index 
            compared_tuple = (file_list[word_number-2], file_list[word_number-1]) #to find
                                                            #preceeding pair to compare
            for pair in tup_pairs:
                if compared_tuple == pair: 
                    dictionary[pair].append(word)  #should append the word to my dict value (list)

    print dictionary  #getting weird results (some words should appear that dont, some
                   # don't appear that should)

mapping(filename)

output:

Lindsays-MBP:markov lindsayg$ python markov.py 
{('a', 'fox?'): [], ('Sam', 'I'): ['am?'], **('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'yo**u']**, ('could', 'you'): ['in', 'with', 'in', 'with'], ('you', 'with'): [], ('box?', 'Would'): [], ('ham?', 'Would'): [], ('I', 'am?'): [], ('you', 'in'): ['a', 'a', 'a', 'a'], ('a', 'house?'): [], ('like', 'green'): ['eggs'], ('like', 'them,'): ['Sam'], ('and', 'ham?'): [], ('Would', 'you'): ['like', 'like'], ('a', 'mouse?'): [], ('them,', 'Sam'): ['I'], ('in', 'a'): ['house?', 'box?'], ('with', 'a'): ['mouse?', 'fox?'], ('house?', 'Would'): [], ('a', 'box?'): [], ('Would', 'you,'): ['could', 'could', 'could', 'could'], ('green', 'eggs'): ['and'], ('you', 'like'): ['green', 'them,'], ('mouse?', 'Would'): [], ('fox?', 'Would'): [], ('eggs', 'and'): ['ham?']}

One example of weird output (there should only be 4 'you' values and there are six):

('you,', 'could'): ['you', 'you', 'you', 'you', 'you', 'you']

fyi file text being used:

Would you, could you in a house?
Would you, could you with a mouse?
Would you, could you in a box?
Would you, could you with a fox?
Would you like green eggs and ham?
Would you like them, Sam I am?

Upvotes: 3

Views: 2093

Answers (1)

Francis Colas
Francis Colas

Reputation: 3647

Your problem is the way you find the index of the word: index gives the first instance. There are 6 'you' (and 4 'you,' that are different) and each of them will get the same index word_number = 3, so they will all be added to the pair ('Would', 'you,').

To get the index, you should use the built-in enumerate:

for word_number, word in enumerate(file_list):
    ...

Upvotes: 3

Related Questions