Reputation: 385
I read this: Stripping everything but alphanumeric chars from a string in Python
and this: Python: Strip everything but spaces and alphanumeric
Didn't quite understand but I tried a bit on my own code, which now looks like this:
import re
decrypt = str(open("crypt.txt"))
crypt = re.sub(r'([^\s\w]|_)+', '', decrypt)
print(crypt)
When I run the script It comes back with this answer: C:\Users\Adrian\Desktop\python>python tick.py ioTextIOWrapper namecrypttxt moder encodingcp1252
I am trying to get away all the extra code from the document and just keep numbers and letter, inside the document the following text can be found: http://pastebin.com/Hj3SjhxC
I am trying to solve the assignment here: http://www.pythonchallenge.com/pc/def/ocr.html
Anyone knows what "ioTextIOWrapper namecrypttxt moder encodingcp1252" means? And how should I format the code to properly strip it from everything except letter and numbers?
Sincerely
Upvotes: 1
Views: 201
Reputation: 5122
You could just search for the alphanumeric chars instead. Like this:
print ''.join(re.findall('[A-Za-z]', decrypt))
And you also want:
decrypt = open("crypt.txt").read()
Upvotes: 3
Reputation: 251388
str(open("file.txt"))
doesn't do what you think it does. open()
returns a file object. str
gives you the string representation of that file object, not the contents of the file. If you want to read the contents of the file use open("file.txt").read()
.
Or, more safely, use a with
statement:
with open("file.txt") as f:
decrypt = f.read()
crypt = ...
# etc.
Upvotes: 5