Vinod
Vinod

Reputation: 533

search string and delete matching line

First i am reading a file from AWS S3.

sample_html = client.get_object(Bucket=sample_bucket, Key='/temp/sample.html')
sample_html = sample_html['Body'].read().decode("UTF-8")

Suppose the sample_html now contains the following text.

sample_html = "<p>this is a sample text</p>
<div id=1234>this is a sample and sample text</div>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>"

How to search "1234" string and delete the entire line? so the output could be something like:

"<p>this is a sample text</p>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>"

Upvotes: 0

Views: 44

Answers (3)

Mafor
Mafor

Reputation: 10681

You can get a list of filtered lines with the following command (see list-comprehensions):

[ln for ln in sample_html.split('\n') if not '1234' in ln]

If you want to get back a string, use the join function:

'\n'.join([ln for ln in sample_html.split('\n') if not '1234' in ln])

Upvotes: 2

Conan Li
Conan Li

Reputation: 494

Try this one liner "\n".join(line for line in sample_html.split("\n") if "1234" not in line)

In [2]: sample_html = """<p>this is a sample text</p>
   ...: <div id=1234>this is a sample and sample text</div>
   ...: <figure>this is again a sample text</figure>
   ...: <p>this is a sample text</p>"""

In [3]: "\n".join(line for line in sample_html.split("\n") if "1234" not in line)
Out[3]: '<p>this is a sample text</p>\n<figure>this is again a sample text</figure>\n<p>this is a sample text</p>'

In [4]: print(_)
<p>this is a sample text</p>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>

In [5]:

Upvotes: 2

Andreas
Andreas

Reputation: 9197

Your text contains lines which have "hidden" characters "\n" which are used as a line break. Therefore you can use that to split the text into line. After that you can filter that list as shown below:

sample_html = '''<p>this is a sample text</p>
<div id=1234>this is a sample and sample text</div>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>'''

sample_html_out = []
for line in sample_html.split("\n"):
    print(line)
    if "1234" not in line:
        sample_html_out.append(line)
        
print("\n".join(sample_html_out))

or as one line:

print("\n".join([l for l in sample_html.split("\n") if "1234" not in l]))

Upvotes: 1

Related Questions