Reputation: 963
I have a messy html that looks like this:
<div id=":0.page.0" class="page-element" style="width: 1620px;">
<div>
<img src="viewer_files/viewer_004.png" class="page-image" style="width: 800px; height: 1131px; display: none;">
<img src="viewer_files/viewer_005.png" class="page-image" style="width: 1600px;">
</div>
</div>// this repeats 100+ times with different 'src' attributes
Now this is all one line actually (i have formatted in multiple lines for easy readibility). I am trying to remove all <img>
tags that have display:none;
set in the inline css. Is it possible to use sed/awk or some other unix command to achieve this? I think if it were a well indented html document, it would've been easy.
Upvotes: 0
Views: 1681
Reputation: 1
Sed has several commands, but most people only learn the substitute command: "s". A useful command deletes every line that matches the restriction: "d".
sed -e "/<img[^>]*display: none;[^>]*>/d" File
Be carreful it's delete entire line.
Upvotes: 0
Reputation: 17757
sed -e "s/<img[^>]*display: none;[^>]*>//g" filein
A quick explanation about sed :
s stands for substitution / are delimiters
s means that the first field will be a pattern to be search, that will be replaced by the second one. The last one are options. g means global (replace it many times if many matches are found).
to replace inplace : sed -i -e "..."
Upvotes: 1
Reputation: 72755
I would use either Twig or XMLStarlet to do this kind of processing. A lot more reliable than sed/awk/grep. Since your pattern is regular and repeating, they would work too.
Upvotes: 3
Reputation: 1519
HTML and regexes are a notoriously bad match, so you probably want something that is HTML-aware. I'd probably go for something like TagSoup, but there are no doubt other options that are more shell-friendly, or suitable for any favourite scripting language you may have.
Upvotes: 3