LoneDruidOfTheDarkArts
LoneDruidOfTheDarkArts

Reputation: 385

stripping away code in python using "re.sub"

I read this: Stripping everything but alphanumeric chars from a string in Python

and this: Python: Strip everything but spaces and alphanumeric

Didn't quite understand but I tried a bit on my own code, which now looks like this:

import re

decrypt = str(open("crypt.txt"))

crypt = re.sub(r'([^\s\w]|_)+', '', decrypt)

print(crypt)

When I run the script It comes back with this answer: C:\Users\Adrian\Desktop\python>python tick.py ioTextIOWrapper namecrypttxt moder encodingcp1252

I am trying to get away all the extra code from the document and just keep numbers and letter, inside the document the following text can be found: http://pastebin.com/Hj3SjhxC

I am trying to solve the assignment here: http://www.pythonchallenge.com/pc/def/ocr.html

Anyone knows what "ioTextIOWrapper namecrypttxt moder encodingcp1252" means? And how should I format the code to properly strip it from everything except letter and numbers?

Sincerely

Upvotes: 1

Views: 201

Answers (2)

jackcogdill
jackcogdill

Reputation: 5122

You could just search for the alphanumeric chars instead. Like this:

print ''.join(re.findall('[A-Za-z]', decrypt))

And you also want:

decrypt = open("crypt.txt").read()

Upvotes: 3

BrenBarn
BrenBarn

Reputation: 251388

str(open("file.txt")) doesn't do what you think it does. open() returns a file object. str gives you the string representation of that file object, not the contents of the file. If you want to read the contents of the file use open("file.txt").read().

Or, more safely, use a with statement:

with open("file.txt") as f:
    decrypt = f.read()
crypt = ... 
# etc.

Upvotes: 5

Related Questions