How to remove characters after new line and space between words

Question

I have list below

a = [' test_ dev $', 'pro gra', 'test ', 'test ']

I need to remove space from in between elements and strip out after the
I need to remove duplicate from the list

expected out is ['test_dev', 'progra', 'test']

Code is below

def remove_tags(text):
    tag_re = re.compile(r'<[^>]+>')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
def remove_tags_newline(text):
    tag_re = re.compile(r'
')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
l = []
for i in a:
    s = remove_tags_newline(remove_tags(i))
    if s not in l:
        l.append(s)
l

My out is ['\ntest_dev\n$', 'progra', 'test'] expected out is ['test_dev', 'progra', 'test']

Wiktor Stribiżew · Accepted Answer

As you mentioned, you only have line feed chars in the input, not combinations of backslash and n.

In this case, you can fix your code by using

def remove_tags_newline(text):
    return "".join(re.sub('(?s)
.*', '', text.strip()).split())

It does the following:

re.sub('(?s) .*', '', text.strip()) - removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s) is a re.S/re.DOTALL equivalent inline modifier that lets . match across lines, and matches LF chars and .* matches any zero or more chars as many as possible)
.split() - splits the string with whitespace
"".join(...) - concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split()).

See the Python demo:

import re
a = ['
test_ dev
$', 'pro gra', 'test
', 'test
']
def remove_tags_newline(text):
    return "".join(re.sub('(?s)
.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']

How to remove characters after new line and space between words

Answers (1)

Related Questions