Reputation: 488
I have list below
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
I need to remove space
from in between elements and strip out after the \n
I need to remove duplicate from the list
expected out is ['test_dev', 'progra', 'test']
Code is below
def remove_tags(text):
tag_re = re.compile(r'<[^>]+>')
remove_tag = tag_re.sub('', text)
return remove_tag.replace(" ", "")
def remove_tags_newline(text):
tag_re = re.compile(r'\n')
remove_tag = tag_re.sub('', text)
return remove_tag.replace(" ", "")
l = []
for i in a:
s = remove_tags_newline(remove_tags(i))
if s not in l:
l.append(s)
l
My out is ['\\ntest_dev\\n$', 'progra', 'test']
expected out is ['test_dev', 'progra', 'test']
Upvotes: 1
Views: 94
Reputation: 627103
As you mentioned, you only have line feed chars in the input, not combinations of backslash and n
.
In this case, you can fix your code by using
def remove_tags_newline(text):
return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
It does the following:
re.sub('(?s)\n.*', '', text.strip())
- removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s)
is a re.S
/re.DOTALL
equivalent inline modifier that lets .
match across lines, and \n
matches LF chars and .*
matches any zero or more chars as many as possible).split()
- splits the string with whitespace"".join(...)
- concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split()
).See the Python demo:
import re
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
def remove_tags_newline(text):
return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']
Upvotes: 1