sim
sim

Reputation: 488

How to remove characters after new line and space between words

I have list below

a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']

expected out is ['test_dev', 'progra', 'test']

Code is below

def remove_tags(text):
    tag_re = re.compile(r'<[^>]+>')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
def remove_tags_newline(text):
    tag_re = re.compile(r'\n')
    remove_tag = tag_re.sub('', text)
    return remove_tag.replace(" ", "")
l = []
for i in a:
    s = remove_tags_newline(remove_tags(i))
    if s not in l:
        l.append(s)
l

My out is ['\\ntest_dev\\n$', 'progra', 'test'] expected out is ['test_dev', 'progra', 'test']

Upvotes: 1

Views: 94

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

As you mentioned, you only have line feed chars in the input, not combinations of backslash and n.

In this case, you can fix your code by using

def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())

It does the following:

  • re.sub('(?s)\n.*', '', text.strip()) - removes any leading/trailing whitespace chars and then removes any text after the first line feed char including it (note that (?s) is a re.S/re.DOTALL equivalent inline modifier that lets . match across lines, and \n matches LF chars and .* matches any zero or more chars as many as possible)
  • .split() - splits the string with whitespace
  • "".join(...) - concats all the strings from the list into a single string without adding any delimiters between the items (thus, removes any whitespace together with .split()).

See the Python demo:

import re
a = ['\ntest_ dev\n$', 'pro gra', 'test\n', 'test\n']
def remove_tags_newline(text):
    return "".join(re.sub('(?s)\n.*', '', text.strip()).split())
print( [remove_tags_newline(x) for x in a] )
# => ['test_dev', 'progra', 'test', 'test']

Upvotes: 1

Related Questions