Ranieri Mazili
Ranieri Mazili

Reputation: 833

How to get only part of a line using grep/sed/awk with regex?

I have an HTML file of which I need to get only an specific part. The biggest challenge here is that this HTML file doesn't have linebreaks, so my grep expression isn't working well.

Here is my HTML file:

<a href="/link1" param1="data1_1" param2="1_2"><p>Test1</p></a><a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Note that I have two anchors (<a>) on this line.

I want to get the second anchor and I was trying to get it using:

cat example.html | grep -o "<a.*Test2</p></a>"

Unfortunately, this command returns the whole line, but I want only:

<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

I don't know how to do this with grep or sed, I'd really appreciate any help.

Upvotes: 1

Views: 241

Answers (3)

Jahid
Jahid

Reputation: 22428

This should do:

grep -o '<a[^>]*><p>Test2</p></a>' example.html

Upvotes: 0

Andreas Louv
Andreas Louv

Reputation: 47099

Using Perl:

$ perl -pe '@a = split(m~(?<=</a>)~, $_);$_ = $a[1]' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Breakdown:

perl -pe '                                       ' # Read line for line into $_
                                                   # and print $_ at the end
                     m~(?<=</a>)~                  # Match the position after
                                                   # each </a> tag
          @a = split(            , $_);            # Split into array @a
                                       $_ = $a[1]  # Take second item

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203209

With GNU awk for multi-char RS, if it's the second record you want:

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} NR==2' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

or if it's the record labeled "Test2":

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"} /<p>Test2<\/p>/' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

or:

$ awk 'BEGIN{RS="</a>"; ORS=RS"\n"; FS="</?p>"} $2=="Test2"' file
<a href="/link2" param1="data1_1" param2="1_2"><p>Test2</p></a>

Upvotes: 1

Related Questions