A value in a list, python

Question

Every character in the English language has a percentage of occurrence, these are the percentages:

A       B       C       D       E       F       G       H       I
.0817   .0149   .0278   .0425   .1270   .0223   .0202   .0609   .0697
J       K       L       M       N       O       P       Q       R
.0015   .0077   .0402   .0241   .0675   .0751   .0193   .0009   .0599
S       T       U       V       W       X       Y       Z   
.0633   .0906   .0276   .0098   .0236   .0015   .0197   .0007

A list called letterGoodness is predefined as:

letterGoodness = [.0817,.0149,.0278,.0425,.1270,.0223,.0202,...

I need to find the "goodness" of a string. For example the goodness of 'I EAT' is: .0697 + .1270 + .0817 + .0906 =.369. This is part of a bigger problem, but I need to solve this to solve the big problem. I started like this:

def goodness(message):
   for i in L:
     for j in i:

So it will be enough to find out how to get the occurrence percentage of any character. Can you help me? The string contains only uppercase letters and spaces.

mgilson · Accepted Answer

letterGoodness is better as a dictionary, then you can just do:

sum(letterGoodness.get(c,0) for c in yourstring.upper())
#                                             #^.upper for defensive programming

To convert letterGoodness from your list to a dictonary, you can do:

import string
letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))

If you're guaranteed to only have uppercase letters and spaces, you can do:

letterGoodness = dict(zip(string.ascii_uppercase,letterGoodness))
letterGoodness[' '] = 0
sum(letterGoodness[c] for c in yourstring)

but the performance gains here are probably pretty minimal so I would favor the more robust version above.

If you insist on keeping letterGoodness as a list (and I don't advise that), you can use the builtin ord to get the index (pointed out by cwallenpoole):

 ordA = ord('A')
 sum(letterGoodness[ord(c)-ordA] for c in yourstring if c in string.ascii_uppercase)

I'm too lazy to timeit right now, but you may want to also define a temporary set to hold string.ascii_uppercase -- It might make your function run a little faster (depending on how optimized str.__contains__ is compared to set.__contains__):

 ordA = ord('A')
 big_letters = set(string.ascii_uppercase)
 sum(letterGoodness[ord(c)-ordA] for c in yourstring.upper() if c in big_letters)

A value in a list, python

Answers (2)

Related Questions