Dmitri A
Dmitri A

Reputation: 110

Bash command to match n line

I have an index HTML file with file/dir listing. It is just a usual filebrowser like :

...content here...    
<td><a href="20130011/">20120011/</a></td>
<td><a href="20130111/">20120111/</a></td>
<td><a href="20130211/">20120211/</a></td>
<td><a href="20130411/">20120411/</a></td>
...content here...

I don't understand how to extract the 2nd line from the bottom.

1) I downloaded HTML with curl

content=$(curl -sL "http://path-to-html")

2) then used

dir=$(echo $content | sed '/.*href="\([0-9]*\/\)".*/!d;s//\1/;q')

which gives me the last match : 20120411.

But how to get the previous one ?

I don't know the total count of items.

Upvotes: 0

Views: 135

Answers (3)

NeronLeVelu
NeronLeVelu

Reputation: 10039

dir=$(echo $content | sed sed -n '/href="\([0-9]\{1,\}\/\)"/ {s|.*href="\([0-9]\{1,\}/\)".*|-\1-|;H;}
$ {x;l;s|.*-\([0-9]\{1,\}/\)-\(\n-[0-9]\{1,\}/-\)\{1\}$|\1|p;}')

The 1 in \{1\}$ specify how much line must be removed from the end

Upvotes: 0

johnsyweb
johnsyweb

Reputation: 141810

This program will print the penultimate line:

echo ${content} | awk '{ pen = ult; ult = $0 } END { print pen }'

This will print the penultimate matching line:

echo ${content} | awk '/href="([0-9]{8}\/)"/ { pen = ult; ult = $0 } END { print pen }'

If you just want to extract the first capture group:

echo ${content} | awk 'match($0, /href="([0-9]{8}\/)"/, a) { pen = ult; ult = a[1] } END { print pen }'

Putting it all together:

bash-4.2$ dir=$(curl -sL http://www.arteetmarte.no/tmp/index.html |
    awk 'match($0, /href="([0-9]{8}\/)"/, a) {
        pen = ult
        ult = a[1] 
    }
    END { 
        print pen 
    }
    ')
bash-4.2$ echo ${dir}
20130918/

Tested with: GNU Awk 4.1.0, API: 1.0

Upvotes: 3

iruvar
iruvar

Reputation: 23364

May be a bit easier with

dir=$(echo "$content"|awk '/href=/{x=p;p=$0}END{sub(/.*">/,"",x);sub(/<.*/, "",x); print x}') 

Upvotes: 0

Related Questions