Leon Surrao
Leon Surrao

Reputation: 637

removing specific items from a list of strings

I have a list of strings and I want to remove specific elements in each string from it. Here is what I have so far:

s = [ "Four score and seven years ago, our fathers brought forth on",
      "this continent a new nation, conceived in liberty and dedicated"]

result = []
for item in s:
    words = item.split()
    for item in words:
        result.append(item)

print(result,'\n')

for item in result:
    g = item.find(',.:;')
    item.replace(item[g],'')
print(result)

The output is:

['Four', 'score', 'and', 'seven', 'years', 'ago,', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation,', 'conceived', 'in', 'liberty', 'and', 'dedicated']

In this case I wanted the new list to contain all the words, but it should not include any punctuation marks except for quotes and apostrophes.

 ['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']

Even though am using the find function the result seems to be same. How can I correct it prints without the punctuation marks? How can I improve upon the code?

Upvotes: 2

Views: 102

Answers (4)

Olivier Poulin
Olivier Poulin

Reputation: 1796

or, you could just add a loop in

for item in result:
    g = item.find(',.:;')
    item.replace(item[g],'')

and split up ,.:; just add an array of punctuation like

punc = [',','.',':',';']

then iterate through it inside for item in result: like

for p in punc:
    g = item.find(p)
    item.replace(item[g],'')

so the full loop is

punc = [',','.',':',';']
for item in result:
    for p in punc:
        g = item.find(p)
        item.replace(item[g],'')

I've tested this, it works.

Upvotes: 1

Travis
Travis

Reputation: 2058

s = [ "Four score and seven years ago, our fathers brought forth on", "this continent a new nation, conceived in liberty and dedicated"]

# Replace characters and split into words
result = [x.translate(None, ',.:;').split() for x in s] 

# Make a list of words instead of a list of lists of words (see http://stackoverflow.com/a/716761/1477364)
result = [inner for outer in result for inner in outer] 

print s

Output:

['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']

Upvotes: 1

Engineero
Engineero

Reputation: 12908

You could strip all the characters that you want to get rid of after you split the string:

for item in s:
    words = item.split()
    for item in words:
        result.append(item.strip(",."))  # note the addition of .strip(...)

You can add whatever characters you want to get rid of to the String argument to .strip(), all in one string. The example above strips out commas and periods.

Upvotes: 2

Krumelur
Krumelur

Reputation: 32497

You can do this by using re.split to specify a regular expression to split on, in this case everything not a number or digit.

import re
result = []
for item in s:
    words = re.split("[^A-Za-z0-9]", s)
    result.extend(x for x in words if x) # Include nonempty elements

Upvotes: 2

Related Questions