Reputation: 533
First i am reading a file from AWS S3.
sample_html = client.get_object(Bucket=sample_bucket, Key='/temp/sample.html')
sample_html = sample_html['Body'].read().decode("UTF-8")
Suppose the sample_html now contains the following text.
sample_html = "<p>this is a sample text</p>
<div id=1234>this is a sample and sample text</div>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>"
How to search "1234" string and delete the entire line? so the output could be something like:
"<p>this is a sample text</p>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>"
Upvotes: 0
Views: 44
Reputation: 10681
You can get a list of filtered lines with the following command (see list-comprehensions):
[ln for ln in sample_html.split('\n') if not '1234' in ln]
If you want to get back a string, use the join
function:
'\n'.join([ln for ln in sample_html.split('\n') if not '1234' in ln])
Upvotes: 2
Reputation: 494
Try this one liner "\n".join(line for line in sample_html.split("\n") if "1234" not in line)
In [2]: sample_html = """<p>this is a sample text</p>
...: <div id=1234>this is a sample and sample text</div>
...: <figure>this is again a sample text</figure>
...: <p>this is a sample text</p>"""
In [3]: "\n".join(line for line in sample_html.split("\n") if "1234" not in line)
Out[3]: '<p>this is a sample text</p>\n<figure>this is again a sample text</figure>\n<p>this is a sample text</p>'
In [4]: print(_)
<p>this is a sample text</p>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>
In [5]:
Upvotes: 2
Reputation: 9197
Your text contains lines which have "hidden" characters "\n" which are used as a line break. Therefore you can use that to split the text into line. After that you can filter that list as shown below:
sample_html = '''<p>this is a sample text</p>
<div id=1234>this is a sample and sample text</div>
<figure>this is again a sample text</figure>
<p>this is a sample text</p>'''
sample_html_out = []
for line in sample_html.split("\n"):
print(line)
if "1234" not in line:
sample_html_out.append(line)
print("\n".join(sample_html_out))
or as one line:
print("\n".join([l for l in sample_html.split("\n") if "1234" not in l]))
Upvotes: 1