Reputation: 93
I want to identify whether any string of the list contain number/ digit at any position, and if so then code should remove that digit from the string by use of python. My code is
pattern = '\w+-\w+[-\w+]*|-';
pattern2 = '\d'
contents = ["babies","walked","boys","walking", "CD28", "IL-2", "honour"];
for token in contents:
if token.endswith("ies"):
f.write(string.replace(token,'ies','y',1))
elif token.endswith('s'):
f.write(token[0:-1])
elif token.endswith("ed"):
f.write(token[0:-2])
elif token.endswith("ing"):
f.write(token[0:-3])
elif re.match(pattern,token):
f.write(string.replace(token,'-',""))
elif re.match(pattern2,token):
f.write(token.translate(None,"0123456789"))
else:
f.write(t)
f.close()
actually the problem is in re.match(patter2,token)
. It does not identify a digit in token but f.write(token.translate(None,"0123456789"))
worked well when I used it alone.
Upvotes: 1
Views: 114
Reputation: 93
import nltk;
import string;
import re;
f=open("stemming.txt",'w')
contents=file.read();
pattern = '\w+-\w+[-\w+]*|-';
digits = re.compile('\d')
contents = ["babies","walked","boys","walking", "CD28", "IL-2", "honour"];
for token in contents:
if token.endswith("ies"):
f.write(string.replace(token,'ies','y',1))
elif token.endswith('s'):
f.write(token[0:-1])
elif token.endswith("ed"):
f.write(token[0:-2])
elif token.endswith("ing"):
f.write(token[0:-3])
elif re.match(pattern,token):
f.write(string.replace(token,'-',""))
elif bool(digits.search(token)):
f.write(token.translate(None,"0123456789"))
else:
f.write(t)
f.close()
Upvotes: 0
Reputation: 180441
If you want to remove digits use str.translate:
contents = ["IL-2", "CD-28","IL2","25"];
print([s.translate(None,"0123456789") for s in contents])
['IL-', 'CD-', 'IL', '']
If you only want to remove the digits if the string contains a mixture:
print([s.translate(None,"0123456789") if not s.isdigit() else s for s in contents])
['IL-', 'CD-', 'IL', '25']
If the digits are always at the end you can use rstrip:
print([s.rstrip("0123456789") for s in contents])
For python 3 you need to create a table using str.maketrans:
tbl = str.maketrans({k:"" for k in dig})
print([s.translate(tbl) for s in contents])
['IL-', 'CD-', 'IL', '']
Upvotes: 5
Reputation: 107297
You can just use re.sub
within a list comprehension :
>>> contents = ["IL-2", "CD-28","IL2","25"]
>>> import re
>>> [re.sub(r'\d','',i) for i in contents]
['IL-', 'CD-', 'IL', '']
But as a better solution for such task you can use str.translate
method!
>>> from string import digits
>>> [i.translate(None,digits) for i in contents]
['IL-', 'CD-', 'IL', '']
And if you are in python 3 :
>>> trans_table = dict.fromkeys(map(ord,digits), None)
>>> [i.translate(trans_table) for i in contents]
['IL-', 'CD-', 'IL', '']
Upvotes: 6