Sam
Sam

Reputation: 67

Creating a program that returns a score by using a key on a list

I'm basically trying to read a txt file, remove all symbols and punctuation that isn't in the alphabet (A-Z), and then produce an output that lists out all the words in the file with a score side by side. In order to get the score I'm trying to compare each letter of the word to a key. This key represents how much the letter is worth. By adding up all of the letter values for the given word, I'll get the total score for that word.

alphakey = {'a': 5, 'b': 7, 'c': 4, 'd': 3, 'e': 7, 'f': 3,
         'g': 3, 'h': 5, 'i': 2, 'j': 2, 'k': 1, 'l': 2,
         'm': 6, 'n': 3, 'o': 1, 'p': 2, 'q': 1, 'r': 4,
         's': 3, 't': 7, 'u': 5, 'v': 5, 'w': 2, 'x': 1,
         'y': 2, 'z': 9}

This is what I have so far, but I'm completely stuck.

with open("hunger_games.txt") as p:
    text = p.read()
    text = text.lower()

text = text.split()
new = []
for word in text:
    if word.isalpha() == False:
        new.append(word[:-1])
    else:
        new.append(word)

class TotalScore():

    def score():
        total = 0
        for word in new:
            for letter in word:
                total += alphakey[letter]
            return total

I'm trying to get something like:

   you 5
   by 4
   cool 10

ect.. for all the words in the list. Thanks in advance for any help.

Upvotes: 1

Views: 351

Answers (3)

TripleD
TripleD

Reputation: 339

Does the punctuation have to be removed? Or are you doing that so that you can match up the keys of the dictionary? If you are okay with the punctuation staying in then this can be solved in a few lines:

alphakey = {'a': 5, 'b': 7, 'c': 4, 'd': 3, 'e': 7, 'f': 3,
     'g': 3, 'h': 5, 'i': 2, 'j': 2, 'k': 1, 'l': 2,
     'm': 6, 'n': 3, 'o': 1, 'p': 2, 'q': 1, 'r': 4,
     's': 3, 't': 7, 'u': 5, 'v': 5, 'w': 2, 'x': 1,
     'y': 2, 'z': 9}

with open("hunger_games.txt") as p:
    text = p.read()
    text = text.lower()

    words = text.split()
    uniqueWords = {}

    for word in words:
        if not word in uniqueWords:
            uniqueWords[word] = sum([alphakey[letter] for letter in word if letter.isalpha()])

    print(uniqueWords)

That last line might need a bit of explanation. First off

[alphakey[letter] for letter in word if letter.isalpha()]

is an example of something called a "list comprehension". They are a very useful feature of Python that lets us create an entire list in a single line. The one I just listed will go through every letter in a "word" and, if it is alphabetical, it will return the value from "alpha key". For example if the word was:

"hello"

it would return the list:

[5, 7, 2, 2, 1]

If the word was:

"w4h&t"

the list comprehension would ignore the "4" and "&" and return the list:

[2, 5, 7]

To turn those into a single value we wrap the comprehension the sum function. So the final value is 17 for the word "hello", and 14 for "w4h&t".

Upvotes: 1

cglacet
cglacet

Reputation: 10912

As pointed out in the comments, you don't need to have a class for that and your return is miss-indented, otherwise I think your score function does what you need to compute the total score.

If you need to have a per-word score you can make use of a dictionary (again), to store these:

def word_score(word):
  return sum(alphakey[l] for l in word)

def text_scores(filename):
  with open(filename) as p:
    text = p.read()
  text = re.sub(r'[^a-zA-Z ]', '', text.lower())
  return {w: word_score(w) for w in text.split()}

print(text_scores("hunger_games.txt"))

If hunger_games.txt contains "you by cool", then this prints:

{'you': 8, 'by': 9, 'cool': 8}

Upvotes: 1

marzique
marzique

Reputation: 694

I suggest you to use nltk for text manipulation. Here is my solution (you can shrink some chunks of code, I just made it more visually simple to understand).

Basically you split text into list of words, then we can remove all duplicates using set() function, and then we loop through all words calculating the score. I hope that code is quite clear.

import nltk

alphakey = {'a': 5, 'b': 7, 'c': 4, 'd': 3, 'e': 7, 'f': 3,
         'g': 3, 'h': 5, 'i': 2, 'j': 2, 'k': 1, 'l': 2,
         'm': 6, 'n': 3, 'o': 1, 'p': 2, 'q': 1, 'r': 4,
         's': 3, 't': 7, 'u': 5, 'v': 5, 'w': 2, 'x': 1,
         'y': 2, 'z': 9}

text = """
boy girl girl boy dog Dog car cAr dog girl you by cool 123asd .asd; 12asd
"""

words = []
results = {}

sentences = nltk.sent_tokenize(text)
for sentence in sentences:
    words += nltk.word_tokenize(sentence)

words = list(set([word.lower() for word in words]))

for word in words:
    if word.isalpha():
        total = 0
        for letter in word:
            total += alphakey[letter]
        results[word] = total


for val in results:
    print(f"{val} {results[val]}")

output:

dog 7
you 8
by 9
boy 10
cool 8
car 13
girl 11

Upvotes: 0

Related Questions