Kenny I.
Kenny I.

Reputation: 1

Sorting and counting words from a text file

I'm new to programming and stuck on my current program. I have to read in a story from a file, sort the words, and count the number of occurrences per word. It will count the words, but it won't sort the words, remove the punctuation, or duplicate words. I'm lost to why its not working. Any advice would be helpful.

ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()

wordlist = []
countlist = []

for line in lines:
    wordlist.append(line)
    line = line.split()
    # line.lower()

    for word in line:
        word = word.strip(". ,  ! ? :  ")
        # word = list(word)
        wordlist.sort()
        sorted(wordlist)
        countlist.append(word)

        print(word,countlist.count(word))

Upvotes: 0

Views: 2406

Answers (3)

Moon Cheesez
Moon Cheesez

Reputation: 2701

There main problem in your code is at the line (line 9):

    wordlist.append(line)

You are appending the whole line into the wordlist, I doubt that is what you want. As you do this, the word added is not .strip()ed before it is added to wordlist.

What you have to do is to add the word only after you have strip()ed it and make sure you only do that after you checked that there are not other same words (no duplicates):

ifile = open("Story.txt",'r')
lines = ifile.readlines()

wordlist = []
countlist = []

for line in lines:
    # Get all the words in the current line
    words = line.split()
    for word in words:
        # Perform whatever manipulation to the word here
        # Remove any punctuation from the word
        word = word.strip(".,!?:;'\"")
        # Make the word lowercase
        word = word.lower()

        # Add the word into wordlist only if it is not in wordlist
        if word not in wordlist:
            wordlist.append(word)

        # Add the word to countlist so that it can be counted later
        countlist.append(word)

# Sort the wordlist
wordlist.sort()

# Print the wordlist
for word in wordlist:
    print(word, countlist.count(word))

Another way you could do this is using a dictionary, storing the word as they key and the number of occurences as the value:

ifile = open("Story.txt", "r")
lines = ifile.readlines()

word_dict = {}

for line in lines:
    # Get all the words in the current line
    words = line.split()
    for word in words:
        # Perform whatever manipulation to the word here
        # Remove any punctuation from the word
        word = word.strip(".,!?:;'\"")
        # Make the word lowercase
        word = word.lower()

        # Add the word to word_dict
        word_dict[word] = word_dict.get(word, 0) + 1

# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()

for word in word_list:
    print(word, word_dict[word])

Upvotes: 1

inspectorG4dget
inspectorG4dget

Reputation: 114035

punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
    for line in infile:
        for word in line.split():
            for p in punctuation:
                word = word.strip(p)
            if word not in counts:
                counts[word] = 0
            counts[word] += 1

with open("WordsKAI.txt",'w') as outfile:
    for word in sorted(counts):  # if you want to sort by counts instead, use sorted(counts, key=counts.get)
        outfile.write("{}: {}\n".format(word, counts[word]))

Upvotes: 0

Artem Kondakov
Artem Kondakov

Reputation: 1

You have to provide a key function to the sorting methods. Try this r = sorted(wordlist, key=str.lower)

Upvotes: 0

Related Questions