Anis
Anis

Reputation: 17

How to extract specific value using grep and awk?

I am facing a problem to extract a specific value in a .txt file using grep and awk. I show below an excerpt from the .txt file: "-

 bravais-lattice index     =            2
 lattice parameter (alat)  =      10.0000  a.u.
 unit-cell volume          =     250.0000 (a.u.)^3
 number of atoms/cell      =            2
 number of atomic types    =            1
 number of electrons       =        28.00
 number of Kohn-Sham states=           18
 kinetic-energy cutoff     =      60.0000  Ry
 charge density cutoff     =     300.0000  Ry
 convergence threshold     =      1.0E-09
 mixing beta               =       0.7000"

I also defined some variable: ELEMENT and lat. I want to extract the "unit-cell volume" value which is equal to 250.00. I tried the following to extract the value using grep and awk:

volume=`grep "unit-cell volume" ./latt.10/$ELEMENT.scf.latt_$lat.out | awk '{printf "%15.12f\n",$5}'`

However, when i run the bash file I always get 00.000000 as a result instead of the correct value of 250.00.

Can anyone help, please? Thanks in advance.

Upvotes: 0

Views: 4372

Answers (3)

Ed Morton
Ed Morton

Reputation: 204638

You never need grep when you're using awk since awk can do anything useful that grep can do. It sounds like this is all you need:

$ awk -F'=' '/unit-cell volume/{printf "%.2f\n",$2}' file
250.00

The above works because when FS is = that means $2 is <spaces>250.000 (a.u.)^3 and when awk is asked to convert a string to a number it strips off leading spaces and anything after the numeric part so that leaves 250.000 to be converted to a number by %.2f.

In the script you posted $5 was failing because the 5th space-separated field in:

    $1         $2    $3      $4         $5
<unit-cell> <volume> <=> <250.0000> <(a.u.)^3>

is (a.u.)^3 - you could have just added print $5 to see that.

Upvotes: 1

James Brown
James Brown

Reputation: 37464

Since you are processing key-value pairs where the key can have variable amount on space in it, you need to tune that field number ($4, $5 etc.) separately for each record you want to process unless you set the field separator (FS) appropriately to FS=" *= *". Then the key will always be in $1 and value in $2.

Then use split to split the value and unit parts from each other.

Also, you can loose that grep by defining in awk a pattern (or condition, /unit-cell volume/) for that printaction:

$ awk 'BEGIN{FS=" *= *"} /unit-cell volume/{split($2,a," +");print a[1]}' file
250.0000

Explained:

$ awk '
BEGIN { FS=" *= *" }   # set appropriate field separator
/unit-cell volume/ {   # pattern or condition
    split($2,a," +")   # split value part to value and possible unit parts
    print a[1]         # output value part
}' file

Upvotes: 0

David Z
David Z

Reputation: 131800

awk '{printf "%15.12f\n",$5}'

You're asking awk to print out the fifth field of the line ($5).

 unit-cell volume          =     250.0000 (a.u.)^3
 1         2               3     4        5

The fifth field is (a.u.)^3, which you are then asking awk to interpret as a number via the %f format code. It's not a number, though (or actually, doesn't start with a number), and when awk is asked to treat a non-numeric string as a number, it uses 0 instead. Thus it prints 0.

Solution: use $4 instead.

By the way, you can skip invoking grep by using awk itself to select the line, e.g.

awk /^ unit-cell/ {...}

The /^ unit-cell/ is a regular expression that matches "unit-cell" (with a leading space) at the beginning of the line. Adjust as necessary if you have other lines that start with unit-cell which you don't want to select.

Upvotes: 3

Related Questions