Bob
Bob

Reputation: 1396

File Manipulation: How to take out Punctuation and Capital letters?

I'm trying to figure out how to open a file, make all the letters in the file lowercase, and then take out all the punctuation. I've tried a few things I've seen online and in my book but I can't seem to figure it out.

import string

def ReadFile(Filename):
    try:
        F = open(Filename)
        F2=F.read()
    except IOError:
        print("Can't open file:",Filename)
        return []
    F3=[]
    for word in F2:
        F3=F2.lower()
    exclude = set(string.punctuation)
    F3= ''.join(ch for ch in F3 if ch not in exclude)
    return F3







Name = input ('Name of file? ')
Words = ReadFile(Name)
print (F3)

Given a sentence such as,

Then he said, "I'm so confused!".

To become

then he said im so confused

Upvotes: 1

Views: 314

Answers (2)

Shijing Lv
Shijing Lv

Reputation: 6736

There are many discussion on this topic, a simple and effective way is:

s="Then he said, \"I\'m so confused!\"." 
s.translate(string.maketrans("",""), string.punctuation)

Similar discussions can be found here:

Remove punctuation from Unicode formatted strings

Python Regex punctuation recognition

Best way to strip punctuation from a string in Python

Upvotes: 0

abarnert
abarnert

Reputation: 365767

The problem with your code is in the very last line:

print (F3)

F3 was the name of the local variable inside the function. You can't access that from here.

But you can access the same value that was in that variable, because the function returned it, and you stored it in Words.

So, just do this:

print(Words)

And now, your code works.


That being said, it can be improved.

Most importantly, look at this part:

F3=[]
for word in F2:
    F3=F2.lower()

The for word in F2: actually loops over every character in F2, because that's how strings work. If you want to go word by word, you need to do something like for word in F2.split():

Meanwhile, inside the loop, you reassign F3 each time through the loop, and never do anything with the previous value, so the whole thing ends up being a very fancy (and slow) way to just do the last assignment.

Fortunately, the last assignment, F3=F2.lower() lowercases the entire string F2, which is exactly what you wanted to do, so it works out anyway. Which means you can replace all three of those lines with:

F3=F2.lower()

You also should always close files that you open. Since this can be tricky (e.g., in your function, you have to remember to close it in both the successful and error cases), the best way to do that is automatically, using a with clause. Replace these two lines:

F = open(Filename)
F2=F.read()

with:

with open(Filename) as F:
    F2=F.read()

After that, other than using a non-PEP-8 style, and performance problems if you have huge files, there's really nothing wrong with your code.

Upvotes: 2

Related Questions