Reputation: 4653

find a pattern in html and replace it with php code

I am looking at finding this pattern

<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

and replacing it with this pattern for a number of .html files

<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

Note the difference is that this

<p class="text-muted">&copy; 2014. Core Team</p>

is replaced with

       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>

I was looking at doing it with sed but having had an initial attempt, my difficulty is the characters I might or might or might not have to escape. Also the tabs or new lines in the php code, I would like that to appear as is here.

There is a number of files to do it to so I would like to automate it but it might be quicker to just do it manually(copy and paste). But maybe sed is the wrong approach in this instance. Can someone kindly direct me in the right direction? At this stage I am open to other languages (e.g. php, python, bash ) to find a solution.

I would then plan to rename each .html file to .php with the following:

for i in *.html; do mv "$i" "${i%.*}.php"; done;

EDIT1

bsed on the awk answer below I can get it to work under this version

$ awk -Wversion 2>/dev/null || awk --version
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2, GNU MP 6.0.0)
Copyright (C) 1989, 1991-2014 Free Software Foundation.

however on this version I get different output. It seems it prints out the 3 files, old new and file. Is this easily rectified in this version?

root@4461f768e343:/github/find_pattern# awk -Wversion 2>/dev/null || awk --version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern# awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div><!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.root@4461f768e343:/github/find_pattern#

Upvotes: 0

Answers (2)

Ed Morton

Reputation: 203995

sed is for simple substitutions on individual lines so your task is certainly not a job for sed. You could use awk if your files are all that well formatted:

$ cat old
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

$ cat new
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

$ cat file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

$ awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

The above uses GNU awk for multi-char RS and ARGIND. If you want to do it for many files you could use:

find . -type f -name '*.php' -exec awk -i inplace -v RS='^$' -v ORS= 'ARGIND==1{old=$0;print;next} ARGIND==2{new=$0;print;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new {} \;

or similar.

Upvotes: 2

Doron Cohen

Reputation: 1046

You can use replace.

html_files = ['a.html', ...]
copyright = '<p class="text-muted">&copy; 2014. Core Team</p>'
new_copyright = """       <?php
        $year = date("Y");
        echo "<p class='text-muted'>© $year. Core Team</p>";
    ?>"""
for html_file_path in html_files:
    with open(html_file_path) as html_file:
        html = html_file.read()

    if copyright in html:
        php_file_path = html_file_path.replace('.html', '.php')
        with open(php_file_path, "w") as php_file:
            php = html.replace(copyright, new_copyright)
            php_file.write(php)

Note this will not override your html files which is useful if the script has an error.

Upvotes: 2

find a pattern in html and replace it with php code

EDIT1

Answers (2)

Related Questions