Reputation: 1
I have set a task in Python to code a long text file 1-26 for the letters of the alphabet and 26+ for non-alphanumerics see code below:
#open the file,read the contents and print out normally
my_file = open("timemachine.txt")
my_text = my_file.read()
print (my_text)
print ""
print ""
#open the file and read each line, taking out the eol chars
with open("timemachine.txt","r") as myfile:
clean_text = "".join(line.rstrip() for line in myfile)
#close the file to prevent memory hogging
my_file.close()
#print out the result all in lower case
clean_text_lower = clean_text.lower()
print clean_text_lower
print ""
print ""
#establish a lowercase alphabet as a list
my_alphabet_list = []
my_alphabet = """ abcdefghijklmnopqrstuvwxyz.,;:-_?!'"()[] %/1234567890"""+"\n"+"\xef"+"\xbb"+"\xbf"
#find the index for each lowercase letter or non-alphanumeric
for letter in my_alphabet:
my_alphabet_list.append(letter)
print my_alphabet_list,
print my_alphabet_list.index
print ""
print ""
#go through the text and find the corresponding letter of the alphabet
for letter in clean_text_lower:
posn = my_alphabet_list.index(letter)
print posn,
When I print this I should get (1) the original text, (2) the text reduced to lower case and no whitespace, (3) the code index used and finally (4) the converted codes. However I can only get the latter part of the original text or if I comment out (4) it will print all the text. Why?
Upvotes: 0
Views: 59
Reputation: 658
The bit at the end:
for letter in clean_text_lower:
posn = my_alphabet_list.index(letter)
print posn,
keeps reassigning posn
without actually doing anything with it. Therefore, you will only get the my_alphabet_list.index(letter)
for the last letter in clean_text_lower.
To fix there's a couple things you could do. First thing that springs to mind is initialize a list and append values to it i.e:
posns = []
for letter in clean_text_lower:
posns.append(my_alphabet_list.index(letter))
print posns,
Upvotes: 2