sunny
sunny

Reputation: 159

How to get only words from the string using python

I have a file which has special characters, so I used file operations to read.

f=open('st.txt','r')
string=f.read()

The sample string is

"Free Quote!\n          \n          Protecting your family is the best investment you\'ll eve=\nr \n" 

now I want to remove all the special characters and get only the words from the string. so that my string will be:

"Free Quote Protecting your family is the best investment you'll ever"

Upvotes: 4

Views: 11836

Answers (2)

jdotjdot
jdotjdot

Reputation: 17052

Probably the simplest way to do this is a simple loop testing against string.ascii_letters plus a specific subset of extra characters (e.g., '-):

>>> import string
>>> text = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"
>>> ''.join([x for x in text if x in string.ascii_letters + '\'- '])
"Free Quote  Protecting your family is the best investment you'll ever "

As you edit longer and more complex texts, excluding specific punctuation marks becomes less sustainable, and you'd need to use more complex regex (for example, when is a ' an apostrophe or a quote?), but for the scope of your problem above, this should suffice.

Upvotes: 4

Sir l33tname
Sir l33tname

Reputation: 4330

I found 3 solutions but there all close but not exactly what you want.

import re
in_string = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n"

#variant 1
#Free Quote Protecting your family is the best investment youll eve r 
out_string = ""
array = "Free Quote!\n \n Protecting your family is the best investment you\'ll eve=\nr \n".split( )
for word in array:
    out_string += re.sub(r'[\W]', '', word) + " "
print(out_string)

#variant 2
#Free Quote Protecting your family is the best investment you ll eve r
print(" ".join(re.findall("[a-zA-Z]+", in_string)))

#variant 3
#FreeQuoteProtectingyourfamilyisthebestinvestmentyoullever
print(re.sub(r'[\W]', '', in_string))

Upvotes: 1

Related Questions