Reputation: 439
Every character in the English language has a percentage of occurrence, these are the percentages:
A B C D E F G H I
.0817 .0149 .0278 .0425 .1270 .0223 .0202 .0609 .0697
J K L M N O P Q R
.0015 .0077 .0402 .0241 .0675 .0751 .0193 .0009 .0599
S T U V W X Y Z
.0633 .0906 .0276 .0098 .0236 .0015 .0197 .0007
A list called letterGoodness
is predefined as:
letterGoodness = [.0817,.0149,.0278,.0425,.1270,.0223,.0202,...
I need to find the "goodness" of a string. For example the goodness of 'I EAT' is: .0697 + .1270 + .0817 + .0906 =.369. This is part of a bigger problem, but I need to solve this to solve the big problem. I started like this:
def goodness(message):
for i in L:
for j in i:
So it will be enough to find out how to get the occurrence percentage of any character. Can you help me? The string contains only uppercase letters and spaces.
Upvotes: 4
Views: 306
Reputation: 6017
You would be better off using a dictionary data structure.
EDIT: This is not my original code but instead the code updated along the lines DSM suggested.
import string
num_vals = [.0817, .0149, .0278, .0425, .1270, .0223, .0202, .0609, .0697 , .0015, .0077,
.0402, .0241, .0675, .0751, .0193, .0009, .0599, .0633, .0906, .0276, .0098,
.0236, .0015, .0197, .0007]
letterGoodness = {letter : value for letter,value in map(None, string.ascii_uppercase, num_vals)}
def goodness(message):
string_goodness = 0
for letter in message:
letter = letter.upper()
if letter in letterGoodness.keys():
string_goodness += letterGoodness[letter]
return string_goodness
print goodness("I eat")
Using the test case you provided:
print goodness("I eat")
yields the output:
.369
One thing to note - building a dictionary as is done here requires on Python 2.7+. The same thing can be accomplished in Python 2.6+ with the dict()
constructor.
Upvotes: 2
Reputation: 310227
letterGoodness is better as a dictionary, then you can just do:
sum(letterGoodness.get(c,0) for c in yourstring.upper())
# #^.upper for defensive programming
To convert letterGoodness
from your list to a dictonary, you can do:
import string
letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))
If you're guaranteed to only have uppercase letters and spaces, you can do:
letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))
letterGoodness[' '] = 0
sum(letterGoodness[c] for c in yourstring)
but the performance gains here are probably pretty minimal so I would favor the more robust version above.
If you insist on keeping letterGoodness
as a list (and I don't advise that), you can use the builtin ord
to get the index (pointed out by cwallenpoole):
ordA = ord('A')
sum(letterGoodness[ord(c)-ordA] for c in yourstring if c in string.ascii_uppercase)
I'm too lazy to timeit
right now, but you may want to also define a temporary set to hold string.ascii_uppercase
-- It might make your function run a little faster (depending on how optimized str.__contains__
is compared to set.__contains__
):
ordA = ord('A')
big_letters = set(string.ascii_uppercase)
sum(letterGoodness[ord(c)-ordA] for c in yourstring.upper() if c in big_letters)
Upvotes: 11