Jack
Jack

Reputation: 361

Best way to parse this particular string using awk / sed?

I need to get a particular version string from a file (call it version.lst) and use it to compare another in a shell script. For example sake, the file contains lines that look like this:

V1.000 -- build date and other info here -- APP1
V1.000 -- build date and other info here -- APP2
V1.500 -- build date and other info here -- APP3

.. and so on. Let's say I am trying to grab the first version (in this case, V1.000) from APP1. Obviously, the versions can change and I want this to be dynamic. What I have right now works:

var = `cat version.lst | grep " -- APP1" | grep -Eo V[0-9].[0-9]{3}`

Pipe to grep will get the line containing APP1 and the second pipe to grep will get the version string. However, I hear grep is not the way to do this so I'd like to learn the best way using awk or sed. Any ideas? I am new to both and haven't found a tutorial easy enough to learn the syntax of it. Do they support egrep? Thanks!

Upvotes: 6

Views: 18015

Answers (3)

Dennis Williamson
Dennis Williamson

Reputation: 360685

Try this to get the complete version:

#!/bin/sh
app=APP1
var=$(awk -v "app=$app" '$NF == app {print $1}' version.lst)

or to get only the major version number, the last line could be:

var=$(awk -v "app=$app" '$NF == app {split($1,a,"."); print a[1]}' version.lst)

Using sed to get the complete version:

var=$(sed -n "/ $app\$/s/^\([^ ]*\).*/\1/p" version.lst)

or this to get only the major version number:

var=$(sed -n "/ $app\$/s/^\([^.]*\).*/\1/p" version.lst)

Explanations:

The second AWK command:

  • -v "app=$app" - set an AWK variable equal to a shell variable
  • $NF == app - if the last field is equal to the contents of the variable (NF is the number of field, so $NF is the contents of the NFth field)
  • {split($1,a,".") - then split the first field at the dot
  • print a[1] - and print the first part of the result of the split

The sed commands:

  • -n - don't print any output unless directed to
  • "/ $app\$/ - for any line that ends with (\$) the contents of the shell variable $app (not that double quotes are used to allow the variable to be expanded and it's a good idea to escape the second dollar sign)
  • s/^\([^ ]*\).*/\1/p" - starting at the beginning of the line (^), capture \(\) the sequence of characters that consists of non-spaces ([^ ]) (or non-dots in the second version) of any number (zero or more *) and match but don't capture all the rest of the characters on the line (.*), replace the matched text (the whole line in this case) with the string that was captured (the version number) (\1 refers to the first (only, in this case) capture group, and print it (p)

Upvotes: 12

$ awk '/^V1\.00.* APP1$/{print $NF}' version.lst
APP1

That regular expression matches lines that start with "V1.00", followed by any number of any other characters, ending with " APP1". The backslash in the middle there might be really important--it matches only ".", and so it excludes (probably corrupt) lines that might begin with, say, "V1a00". The space before "APP1" excludes things like "APP2_APP1".

"NF" is an automatically generated variable that contains the number of field in the input line. It's also the number of the last field, which happens to be the one you're interested in.

There are a couple of ways to prune off the "V1". Here's one way, although you and I might not be talking about quite the same thing.

$ awk '/^V1\.00.* APP1$/{print substr($1, 1, index($1, ".") - 1), $NF}' version.lst
V1 APP1

Upvotes: 2

Flavius Stef
Flavius Stef

Reputation: 13798

If I understood correctly: egrep "APP1$" version.lst | awk '{print $1}'

Upvotes: 3

Related Questions