user3092781
user3092781

Reputation: 313

find and remove some substrings from a long list of string in python

I need to read a list of string and removing some special character. I wrote code which works but I am looking for a way to write this code efficient.Because, I need to do this process for 1 million long lists(e.g each list has 100000 words).

I wrote example to clear my question.

input:
 str= ['short', 'club', 'edit', 'post\C2', 'le\C3', 'lundi', 'janvier', '2008'] 
 specialSubString=['\C2','\C3','\E2'] 

output:
 str= ['short', 'club', 'edit', 'post', 'le', 'lundi', 'janvier', '2008'] 

My code:

ml=len(str)
for w in range(0,ml):
   for i in range(0, len(specialSubString)):
       token=specialSubString[i]
       if token not in str[w]: 
          continue
       else:
          l= len(token)
          t= str[w]
          end= len(t)-l
          str[w]=t[:end]
          break

for w in str:
    print w

Upvotes: 1

Views: 1402

Answers (1)

TigerhawkT3
TigerhawkT3

Reputation: 49320

Create a string with all the special characters you'd like to remove, and strip them off the right side:

strings = ['short', 'club', 'edit', 'post\C2', 'le\C3', 'lundi', 'janvier', '2008']
special = ''.join(['\C2','\C3','\E2']) # see note

Note at this point that \ is a special character and you should escape it whenever you use it, to avoid ambiguity. You can also simply create a string literal rather than using str.join.

special = '\\C2\\C3\\E2' # that's better

strings[:] = [item.rstrip(special) for item in strings]

Upvotes: 3

Related Questions