Reputation: 637
I have a list of strings and I want to remove specific elements in each string from it. Here is what I have so far:
s = [ "Four score and seven years ago, our fathers brought forth on",
"this continent a new nation, conceived in liberty and dedicated"]
result = []
for item in s:
words = item.split()
for item in words:
result.append(item)
print(result,'\n')
for item in result:
g = item.find(',.:;')
item.replace(item[g],'')
print(result)
The output is:
['Four', 'score', 'and', 'seven', 'years', 'ago,', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation,', 'conceived', 'in', 'liberty', 'and', 'dedicated']
In this case I wanted the new list to contain all the words, but it should not include any punctuation marks except for quotes and apostrophes.
['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']
Even though am using the find function the result seems to be same. How can I correct it prints without the punctuation marks? How can I improve upon the code?
Upvotes: 2
Views: 102
Reputation: 1796
or, you could just add a loop in
for item in result:
g = item.find(',.:;')
item.replace(item[g],'')
and split up ,.:;
just add an array of punctuation like
punc = [',','.',':',';']
then iterate through it inside for item in result:
like
for p in punc:
g = item.find(p)
item.replace(item[g],'')
so the full loop is
punc = [',','.',':',';']
for item in result:
for p in punc:
g = item.find(p)
item.replace(item[g],'')
I've tested this, it works.
Upvotes: 1
Reputation: 2058
s = [ "Four score and seven years ago, our fathers brought forth on", "this continent a new nation, conceived in liberty and dedicated"]
# Replace characters and split into words
result = [x.translate(None, ',.:;').split() for x in s]
# Make a list of words instead of a list of lists of words (see http://stackoverflow.com/a/716761/1477364)
result = [inner for outer in result for inner in outer]
print s
Output:
['Four', 'score', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated']
Upvotes: 1
Reputation: 12908
You could strip all the characters that you want to get rid of after you split the string:
for item in s:
words = item.split()
for item in words:
result.append(item.strip(",.")) # note the addition of .strip(...)
You can add whatever characters you want to get rid of to the String argument to .strip()
, all in one string. The example above strips out commas and periods.
Upvotes: 2
Reputation: 32497
You can do this by using re.split
to specify a regular expression to split on, in this case everything not a number or digit.
import re
result = []
for item in s:
words = re.split("[^A-Za-z0-9]", s)
result.extend(x for x in words if x) # Include nonempty elements
Upvotes: 2