Reputation: 215
Decipher( S ) will be given a string of English text shifted by some amount. Then, decipher should return, to the best of its ability, the original English string, which will be some rotation (possibly 0) of the input S. This means I have to try each possible decoding and estimate how English they are.
My approach is to use letter frequencies:
def letProb( c ):
""" if c is an alphabetic character,
we return its monogram probability (for english),
otherwise we return 1.0 We ignore capitalization.
Adapted from
http://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
"""
if c == 'e' or c == 'E': return 0.1202
if c == 't' or c == 'T': return 0.0910
if c == 'a' or c == 'A': return 0.0812
if c == 'o' or c == 'O': return 0.0768
if c == 'i' or c == 'I': return 0.0731
if c == 'n' or c == 'N': return 0.0695
if c == 's' or c == 'S': return 0.0628
if c == 'r' or c == 'R': return 0.0602
if c == 'h' or c == 'H': return 0.0592
if c == 'd' or c == 'D': return 0.0432
if c == 'l' or c == 'L': return 0.0398
if c == 'u' or c == 'U': return 0.0288
if c == 'c' or c == 'C': return 0.0271
if c == 'm' or c == 'M': return 0.0261
if c == 'f' or c == 'F': return 0.0230
if c == 'y' or c == 'Y': return 0.0211
if c == 'w' or c == 'W': return 0.0209
if c == 'g' or c == 'G': return 0.0203
if c == 'p' or c == 'P': return 0.0182
if c == 'b' or c == 'B': return 0.0149
if c == 'v' or c == 'V': return 0.0111
if c == 'k' or c == 'K': return 0.0069
if c == 'x' or c == 'X': return 0.0017
if c == 'q' or c == 'Q': return 0.0011
if c == 'j' or c == 'J': return 0.0010
if c == 'z' or c == 'Z': return 0.0007
return 1.0
Also I use this formula:
def list_to_str( L ):
""" L must be a list of characters; then,
this returns a single string from them
"""
if len(L) == 0: return ''
return L[0] + list_to_str( L[1:] )
And this:
def rot(c, n):
""" rot rotates c, a single character forward by n spots in the
alphabet
input: a single character
input: a non-negative integer n between 0 and 25
output: c forward by n spots in the alphabet
"""
if 'a' <= c <= 'z':
neword = ord(c) +n
if neword > ord('z'):
neword = neword - 26
elif 'A' <= c <= 'Z':
neword = ord(c) + n
if neword > ord('Z'):
neword = neword - 26
else:
neword = ord(c)
return chr(neword)
And finally this:
def decipher( S ):
"""
"""
L = [[rot(c, n) for c in S] for n in range(26)]
LoL = [[sum(letProb(c)) for letter in encoding] for encoding in L ]
L = max(LoL)
return list_to_str(L)
The first two formulas are good, but in the final formula there is something wrong, for sure in the sentence:
LoL = [[sum(letProb(c)) for letter in encoding] for encoding in L ]
TypeError: 'float' object is not iterable
Upvotes: 3
Views: 8304
Reputation: 46759
You might find the following useful. You could use some kind of lookup for the letter probabilities. If a dictionary is used, you can use get
to provide your default probability of 1.0
for non A to Z characters.
Secondly, Python's string
module has a make_trans
function that could be used with translate
to perform your rotations.
import string
letter_probs = {
'e' : 0.1202, 't' : 0.0910, 'a' : 0.0812, 'o' : 0.0768,
'i' : 0.0731, 'n' : 0.0695, 's' : 0.0628, 'r' : 0.0602,
'h' : 0.0592, 'd' : 0.0432, 'l' : 0.0398, 'u' : 0.0288,
'c' : 0.0271, 'm' : 0.0261, 'f' : 0.0230, 'y' : 0.0211,
'w' : 0.0209, 'g' : 0.0203, 'p' : 0.0182, 'b' : 0.0149,
'v' : 0.0111, 'k' : 0.0069, 'x' : 0.0017, 'q' : 0.0011,
'j' : 0.0010, 'z' : 0.0007}
def rotate(text, rotate_by):
s_from = string.ascii_lowercase + string.ascii_uppercase
s_to = string.ascii_lowercase[rotate_by:] + string.ascii_lowercase[:rotate_by] + \
string.ascii_uppercase[rotate_by:] + string.ascii_uppercase[:rotate_by]
cypher_table = string.maketrans(s_from, s_to)
return text.translate(cypher_table)
def sum_probs(text):
global letter_probs
return sum(letter_probs.get(c.lower(), 1.0) for c in text)
def decipher(text):
probabilities = []
for r in range(26):
rotated = rotate(text, r)
probabilities.append((sum_probs(rotated), rotated, r))
return sorted(probabilities)
for probability, decoded_text, rotate_value in decipher("Mjqqt Btwqi"):
print '{:2.2f} {:2} "{}"'.format(probability, rotate_value, decoded_text)
This would display all possibilities in probability order:
1.17 19 "Fcjjm Umpjb"
1.20 16 "Czggj Rjmgy"
1.22 9 "Vszzc Kcfzr"
1.26 20 "Gdkkn Vnqkc"
1.28 13 "Zwddg Ogjdv"
1.29 4 "Qnuux Fxaum"
1.29 15 "Byffi Qilfx"
1.30 8 "Uryyb Jbeyq"
1.31 6 "Spwwz Hzcwo"
1.32 5 "Rovvy Gybvn"
1.32 0 "Mjqqt Btwqi"
1.33 12 "Yvccf Nficu"
1.34 1 "Nkrru Cuxrj"
1.37 23 "Jgnnq Yqtnf"
1.39 7 "Tqxxa Iadxp"
1.40 22 "Ifmmp Xpsme"
1.40 2 "Olssv Dvysk"
1.44 25 "Lipps Asvph"
1.45 17 "Dahhk Sknhz"
1.47 24 "Khoor Zruog"
1.49 11 "Xubbe Mehbt"
1.52 3 "Pmttw Ewztl"
1.56 10 "Wtaad Ldgas"
1.58 21 "Hello World"
1.61 14 "Axeeh Phkew"
1.68 18 "Ebiil Tloia"
As you can see, the answer is near the bottom.
Upvotes: 0
Reputation: 175
You are trying to sum(letProb(c)) where letProb returns a Float, and sum needs an iterable, a list o a generator, lets say. Try something like this.
LoL = [[sum(letProb(c) for letter in encoding)] for encoding in L ]
I also think that c maybe should be letter, but I don't, if you post complete code, rot function included, maybe I can help you better.
Upvotes: 1