nopuck4you
nopuck4you

Reputation: 1700

How to replace markup in html files stored on unix/solaris servers?

I'm looking for a way to grab a piece of markup that is in a 1000+ html files published on unix servers (running via apache) and replace the markup with either empty nodes or alternate html markup.

ex:

Find

<div id="someComponent"> .....{a bunch of interior markup} .... </div>

Replace with {empty}

ex 2:

Find </div></body>

Replace </div>{some HTML markup needed here}</body>

Upvotes: 0

Views: 217

Answers (3)

yogsototh
yogsototh

Reputation: 15081

If it is really simple (no parse needed, markup well known and not one into another), the fastest way should be :

(In Zsh or Bash)

perl -pi -e 's#<div class="toto">.*?</div>#<span>new content</span>#g' /path/to/files/**/*.html(.)

That should do the trick to replace all between all ...<div class="toto">.....</div>... by ...<span>newcontent</span>...

But beware it will NOT work for ...<div class="toto"> ... <div class="toto"> ... </div> ... </div> ....

Upvotes: 1

Walter Mundt
Walter Mundt

Reputation: 25271

If the markup is written in the same way in all the files, sed or perl will be much quicker than BeautifulSoup or the like, but it's also harder to make flexible in terms of various ways of expressing the same HTML markup in text form.

Do you have a more concrete example of what kind of markup you're looking for, and ideally how it might vary from file to file? Where in the file will it be? Also, is it okay to prettify or tidy the HTML in the process if necessary?

Oh, and are you running something on the server(s), or do you need code to spider the server to retrieve the HTML files for processing?

Upvotes: 0

jldupont
jldupont

Reputation: 96806

One way to do it: use Python with BeautifulSoup to parse the HTML file, do replacement and write back.

Upvotes: 1

Related Questions