Extract text between two strings in simple example.html file

Question

I have a very basic html file called example.html (see below)




    
        
            Lorem ipsum...
        
        
            Lorem ipsum...
        
        
            Lorem ipsum...

and I'd like to get only phrase like (see below), but not by removing first and last 3 lines.


    Lorem ipsum...

I have tried with awk:

cat example.html | awk '/^$/,/^<\/div>$/ { print }'

but something seems to be wrong.

I also tried with body tag (see below)

cat example.html | awk '/^$/,/^<\/body>$/ { print }'

(result)



    
        
            Lorem ipsum...
        
        
            Lorem ipsum...
        
        
            Lorem ipsum...

And it's working correctly.

What I've doing wrong?

Thanks in advance.

glenn jackman · Accepted Answer

xmlstarlet sel -t -c '//div[@class="research"]' -nl example.html


        
            Lorem ipsum...
        
        
            Lorem ipsum...
        
        
            Lorem ipsum...

Answers (1)