Reputation: 39
Write the function list_of_words that takes a list of strings as above and returns a list of individual words with all white space and punctuation removed (except for apostrophes/single quotes).
My code removes periods and spaces, but not commas or exclamation points.
def list_of_words(list_str):
m = []
for i in list_str:
i.strip('.')
i.strip(',')
i.strip('!')
m = m+i.split()
return m
print(list_of_words(["Four score and seven years ago, our fathers brought forth on",
"this continent a new nation, conceived in liberty and dedicated",
"to the proposition that all men are created equal. Now we are",
" engaged in a great civil war, testing whether that nation, or any",
"nation so conceived and so dedicated, can long endure!"])
Upvotes: 0
Views: 3268
Reputation: 2424
It would be better not to rely on your own list of punctuation, but use python's one and as others have pointer, use regex to remove chars:
punctuations = re.sub("[`']", "", string.punctuation)
i = re.sub("[" + punctuations + "]", "", i)
There's also string.whitespace
, although split does take care of them for you.
Upvotes: 0
Reputation: 52143
One of the easiest way to clear some punctuation marks and multiple whitespaces would be using re.sub
function.
import re
sentence_list = ["Four score and seven years ago, our fathers brought forth on",
"this continent a new nation, conceived in liberty and dedicated",
"to the proposition that all men are created equal. Now we are",
" engaged in a great civil war, testing whether that nation, or any",
"nation so conceived and so dedicated, can long endure!"]
sentences = [re.sub('([,.!]){1,}', '', sentence).strip() for sentence in sentence_list]
words = ' '.join([re.sub('([" "]){2,}', ' ', sentence).strip() for sentence in sentences])
print words
"Four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure"
Upvotes: 2
Reputation: 151
You could use regular expressions, as explained in this question. Essentially,
import re
i = re.sub('[.,!]', '', i)
Upvotes: 1
Reputation: 1045
As suggested before, you need to assign the i.strip()
to i
. And as mentioned before, the replace method is better. Here is an example using the replace method:
def list_of_words(list_str:list)->list:
m=[]
for i in list_str:
i = i.replace('.','')
i = i.replace(',','')
i = i.replace('!','')
m.extend(i.split())
return m
print(list_of_words([ "Four score and seven years ago, our fathers brought forth on",
"this continent a new nation, conceived in liberty and dedicated",
"to the proposition that all men are created equal. Now we are",
" engaged in a great civil war, testing whether that nation, or any",
"nation so conceived and so dedicated, can long endure! ])
As you can notice, I have also replaced m=m+i.split()
with m.append(i.split())
to make it easier to read.
Upvotes: 0
Reputation: 4318
strip
returns the string, you should catch and apply the remaining strips.
so your code should be changed to
for i in list_str:
i = i.strip('.')
i = i.strip(',')
i = i.strip('!')
....
on second note, strip
removes the mentioned characters only on start and end of strings. If you want to remove characters in-between the string, you should consider replace
Upvotes: 1