Reputation: 159
I have a file which has special characters, so I used file operations to read.
f=open('st.txt','r')
string=f.read()
The sample string is
"Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
now I want to remove all the special characters and get only the words from the string. so that my string will be:
"Free Quote Protecting your family is the best investment you'll ever"
Upvotes: 4
Views: 11836
Reputation: 17052
Probably the simplest way to do this is a simple loop testing against string.ascii_letters
plus a specific subset of extra characters (e.g., '-
):
>>> import string
>>> text = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
>>> ''.join([x for x in text if x in string.ascii_letters + '\'- '])
"Free Quote Protecting your family is the best investment you'll ever "
As you edit longer and more complex texts, excluding specific punctuation marks becomes less sustainable, and you'd need to use more complex regex (for example, when is a '
an apostrophe or a quote?), but for the scope of your problem above, this should suffice.
Upvotes: 4
Reputation: 4330
I found 3 solutions but there all close but not exactly what you want.
import re
in_string = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
#variant 1
#Free Quote Protecting your family is the best investment youll eve r
out_string = ""
array = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n".split( )
for word in array:
out_string += re.sub(r'[\W]', '', word) + " "
print(out_string)
#variant 2
#Free Quote Protecting your family is the best investment you ll eve r
print(" ".join(re.findall("[a-zA-Z]+", in_string)))
#variant 3
#FreeQuoteProtectingyourfamilyisthebestinvestmentyoullever
print(re.sub(r'[\W]', '', in_string))
Upvotes: 1