Chubaka
Chubaka

Reputation: 3135

awk next and pattern match

If we have the following csv files, we only want to get the $9 in the "DELTA Energy Terns" part, excluding the line starting with "Frame"

Ligand Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,G gas,G solv,TOTAL
0,0.0,0.0,-37.2465,2.70257904,98.8916,0.0,-34.54392096,64.34767904
1,0.0,0.0,-33.1958,2.71419624,80.6403,0.0,-30.48160376,50.15869624

DELTA Energy Terms
Frame #,VDWAALS,EEL,EGB,ESURF,ESCF,DELTA G gas,DELTA G solv,DELTA TOTAL
0,-43.3713,0.0,44.4036,-5.24443392,-27.4605,-43.3713,39.15916608,-31.67263392
1,-43.7597,0.0,37.343,-5.1764544,-23.3471,-43.7597,32.1665456,-34.9402544
2,-42.5618,0.0,44.0748,-5.2738956,-26.6719,-42.5618,38.8009044,-30.4327956
3,-43.1034,0.0,41.3681,-5.25029544,-27.1501,-43.1034,36.11780456,-34.13569544

Desired output:

-31.6726
-34.9402
-30.4327
-34.1356

The following attempts will print out all the $9, including $9 in the "Ligand Energy Terms" part.

awk -F, '$1 ~ /DELTA Energy Terms/ {next} $1 ~ /Frame/ {next} {printf("%24.4f\n",$9)}'

awk -F, '$1 ~ /DELTA Energy Terms/ {next}  {printf("%24.4f\n",$9)}'

Could any guru enlighten?

Upvotes: 0

Views: 2581

Answers (4)

BWhite
BWhite

Reputation: 853

All these solutions work, so solved the immediate problem, but none answered the implied question.

To review the command in question, why doesn't this work?

'$1 ~ /DELTA Energy Terms/ {next} $1 ~ /Frame/ {next} {printf("%24.4f\n",$9)}

Let's break it down.

# Skip every line where the first field matches. 
$1 ~ /DELTA Energy Terms/ {next} 
  # No line matches this criteria, so this has no effect. 
  # Explanation: The field separator isn't set, so defaults to breaking fields on white space. 
  # If you print out the first field, you will see "DELTA" on this line, not "DELTA Energy Terms".

# Skip every line where the first field matches "Frame". 
$1 ~ /Frame/ {next} 
  # This matches and gets skipped.

# Print every line that didn't get skipped.
{printf("%24.4f\n",$9)}
  # The two "Energy Terms" title lines don't have any entries in field 9, 
  # so it prints blanks for those lines.

Upvotes: 0

D.Shawley
D.Shawley

Reputation: 59553

The following should do the trick

awk -F, '/^DELTA/ {capture=1} /Energy Terms$/ {next} /^Frame/ {next} (capture) {print $9}'

I use a capture flag to control whether individual records should be captured. By default capture is zero. When the DELTA Energy Terms line is parsed, I start capturing. I skip any rows that end in Energy Terms or start with Frame. Otherwise, if we are "capturing", then I bring out the ninth element.

If you are using this script regularly, I recommend using something like the following script:

#!/usr/bin/awk -f
BEGIN {
    FS = ","
}
/^DELTA Energy Terms/ {
    capture = 1;
    next
}
/Energy Terms$/ {
    capture = 0;
    next
}
/^Frame/ { next }
(capture) { print $9 }

Save the script as extract-delta and make it executable, then you can use it just like any other shell command:

$ cat input-file | tr -d '\015' | ./extract-delta
-31.67263392
-34.9402544
-30.4327956
-34.13569544

Upvotes: 3

Avinash Raj
Avinash Raj

Reputation: 174696

You could try the below awk command.

$ awk -v RS="\n\n" -v FS="\n" '/^DELTA Energy Terms/{for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}}' RS=  file
-31.67263392
-34.9402544
-30.4327956
-34.13569544
  • RS="\n\n", so a blank line is set to Record Separator.
  • FS="\n", a newline character is set to Field Separator.
  • /^DELTA Energy Terms/ If a Record starts with ^DELTA Energy Terms then do the following operation on that particular record.
  • {for(i=3;i<=NF;i++){split($i, a, /,/);print a[9]}} iterate over all the fields except for 1 and 2 and then split each field according to the comma, then store the spitted items into an array named a .
  • print a[9] prints the element at 9th index in the associative array a.

Upvotes: 1

dinox0r
dinox0r

Reputation: 16039

You can also accomplish this with bash, using the following:

tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt | cut -d":" -f1) )) input.txt | cut -d"," -f9

The tail -n +$((2 + $(grep -n "DELTA Energy Terms" input.txt part will print the lines of the input file starting at the line that contains DELTA Energy Terms plus 2, then cut will give you the 9th field that you're looking for.

Upvotes: 1

Related Questions