anonymous fox
anonymous fox

Reputation: 39

using strip() in python

Write the function list_of_words that takes a list of strings as above and returns a list of individual words with all white space and punctuation removed (except for apostrophes/single quotes).

My code removes periods and spaces, but not commas or exclamation points.

def list_of_words(list_str):
    m = []
    for i in list_str:
        i.strip('.')
        i.strip(',')
        i.strip('!')
        m = m+i.split()
    return m

print(list_of_words(["Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure!"])

Upvotes: 0

Views: 3268

Answers (5)

Eran
Eran

Reputation: 2424

It would be better not to rely on your own list of punctuation, but use python's one and as others have pointer, use regex to remove chars:

punctuations = re.sub("[`']", "", string.punctuation)
i = re.sub("[" + punctuations + "]", "", i)

There's also string.whitespace, although split does take care of them for you.

Upvotes: 0

Ozgur Vatansever
Ozgur Vatansever

Reputation: 52143

One of the easiest way to clear some punctuation marks and multiple whitespaces would be using re.sub function.

import re

sentence_list = ["Four score and seven years ago, our fathers brought forth on",
                 "this continent a new nation, conceived in liberty and dedicated",
                 "to the proposition that all men are created equal.  Now we are",
                 "   engaged in a great        civil war, testing whether that nation, or any",
                 "nation so conceived and so dedicated, can long endure!"]

sentences = [re.sub('([,.!]){1,}', '', sentence).strip() for sentence in sentence_list]
words = ' '.join([re.sub('([" "]){2,}', ' ', sentence).strip() for sentence in sentences])

print words
"Four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure"

Upvotes: 2

nblivingston
nblivingston

Reputation: 151

You could use regular expressions, as explained in this question. Essentially,

import re

i = re.sub('[.,!]', '', i)

Upvotes: 1

jkd
jkd

Reputation: 1045

As suggested before, you need to assign the i.strip() to i. And as mentioned before, the replace method is better. Here is an example using the replace method:

def list_of_words(list_str:list)->list:
    m=[]
    for i in list_str:
        i = i.replace('.','')
        i = i.replace(',','')
        i = i.replace('!','')
        m.extend(i.split())
    return m

print(list_of_words([ "Four score and seven years ago, our fathers brought forth on",
  "this continent a new nation, conceived in liberty and dedicated",
  "to the proposition that all men are created equal.  Now we are",
  "   engaged in a great        civil war, testing whether that nation, or any",
  "nation so conceived and so dedicated, can long endure! ])

As you can notice, I have also replaced m=m+i.split() with m.append(i.split()) to make it easier to read.

Upvotes: 0

venpa
venpa

Reputation: 4318

strip returns the string, you should catch and apply the remaining strips. so your code should be changed to

for i in list_str:
    i = i.strip('.')
    i = i.strip(',')
    i = i.strip('!')
    ....

on second note, strip removes the mentioned characters only on start and end of strings. If you want to remove characters in-between the string, you should consider replace

Upvotes: 1

Related Questions