kittycat
kittycat

Reputation: 15045

Match & Extract Multi-line Pattern In File

I made a Bash script to download this page http://php.net/downloads.php and then search for the first occurrence of the latest PHP filename, version and MD5sum. Right now I have it working but broken up into two different sed commands. When I try to put the regexps into a single one it wont match. I believe it has to do with the newlines present.
How do I go about using one single sed pattern where I get all three matches in either an array (preferred) or seperated by spaces.

Btw, it does not have to be sed. I just want something where the system that the script will be run on will likely work, so no perl for instance.

wget -q http://php.net/downloads.php
FILE_INFO=$(sed -nr "s/.*(php-([0-9\.]+)\.tar\.bz2).*/\1 \2/p;T;q" downloads.php)
MD5SUM=$(sed -nr "s/.*md5: ([0-9a-f]{32}).*/\1/p;T;q" downloads.php)

echo $FILE_INFO
echo $MD5SUM

These are the two lines from the file in question and it needs to extract the info from:

  <a href="/get/php-5.4.5.tar.bz2/from/a/mirror">PHP 5.4.5 (tar.bz2)</a> [10,754Kb] -  19 July 2012<br />
  <span class="md5sum">md5: ffcc7f4dcf2b79d667fe0c110e6cb724</span>

Upvotes: 0

Views: 1193

Answers (2)

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed '\|<a href="/get/php|!d;N;s/.*\(php-\([0-9\.]\+\)\.tar\.bz2\).*md5: \([0-9a-f]\{32\}\).*/\1 \2 \3/;q' file

Upvotes: 1

Stephane Rouberol
Stephane Rouberol

Reputation: 4384

sed -nr 's/.*(php-([0-9\.]+)\.tar\.bz2).*/\1 \2/p;s/.*md5: ([0-9a-f]{32}).*/\1/p;T;' downloads.php

Upvotes: 1

Related Questions