Raymond gsh
Raymond gsh

Reputation: 383

Awk read 3 characters after a specific character

I have an output file which has a part shown below.

================================================================================
                                       INPUT FILE
================================================================================
NAME = t-Butylvinylidene-s.inp
|  1> ! LPNO-CCSD cc-pVTZ cc-pVTZ/C UNO TIGHTSCF TIGHTOPT Grid6 NOFINALGrid NUMGRAD PAL4
|  2> 
|  3> %geom Scan
|  4> A 2 1 15 = 67, 71, 10
|  5> end
|  6> end
|  7> 
|  8> *xyz 0 1 
|  9> 6        4.053878000    -18.527907000     -3.717354000
| 10> 6        3.588474000    -18.874154000     -5.083237000
| 11> 6        2.917226000    -19.112390000     -6.132425000
| 12> 6        2.817703000    -18.178677000     -2.886206000
| 13> 1        2.133454000    -19.025647000     -2.847879000
| 14> 1        3.094894000    -17.913405000     -1.866801000
| 15> 1        2.286657000    -17.336824000     -3.329174000
| 16> 6        5.010397000    -17.327109000     -3.786851000
| 17> 1        5.368223000    -17.071145000     -2.789879000
| 18> 1        5.877217000    -17.555623000     -4.406951000
| 19> 1        4.511903000    -16.455783000     -4.209438000
| 20> 6        4.792242000    -19.727095000     -3.102721000
| 21> 1        5.654756000    -20.005483000     -3.708269000
| 22> 1        5.149078000    -19.479242000     -2.103325000
| 23> 1        4.135842000    -20.593249000     -3.030303000
| 24> 1        4.320782000    -19.183475000     -5.923829000
| 25> *
| 26> 
| 27>                          ****END OF INPUT****
================================================================================

I want to read third and 3 last characters.

|  4> A 2 1 15 = 67, 71, 10

I had the code below to do this.

read -r -a scanopt <<< $(awk '
/INPUT FILE/ { input=1;}
input && 
/geom Scan/ {getline;gsub(",",""); print $3,$8,$9,$10,"T";exit}
' OFS="\t" "$path")

the input there is to make sure I find the right sentence, etc My problem is sometimes the line can be different so the last 3 numbers which I need would be at a different positions! a few example are:

B 1 2 = 1.2, 2, 9
D 4 8 9 5 = 50, 60, 12

I need the 1st and the last 3 characters, first has a constant position so easy,... but the last 3, any ideas how to do this all I can think of is a big loop with lots of if's.

Another issue I want to consider, is that if in the input file somebody would enter the info in different ways as below:

 %geom Scan A 2 1 15 = 67, 71, 10
end

or

 %geom Scan 

A 2 1 15 = 67, 71, 10
end

so I actually need to process word by word from when I encounter %geom scan until end. now I'm doing it line by line!

Upvotes: 0

Views: 416

Answers (2)

karakfa
karakfa

Reputation: 67507

I guess you mean third and last three fields?

awk '{print $3, $(NF-2), $(NF-1), $NF}' 

will do that.

for the other requirements I think this should work

awk '     /end/{f=0} 
   /%geom Scan/{f=1;sub(/^.*%geom Scan/,"")} 
        f&&NF>3{print $3,$(NF-2),$(NF-1),$NF}' 

Updated to trim the header line and guard for field count.

Upvotes: 3

Jeff Y
Jeff Y

Reputation: 2456

The key is knowing what you can count on 99.9% and what you can't in the input. And also knowing that awk allows for picking off fields "from the end" as well.

It looks to me like you can always count on the lines of interest (and only those lines) to contain the pattern [digit][optional spaces][equal sign]. If that is true, this should work:

awk '/[0-9]\s*=/{print $3, $(NF-2), $(NF-1), $NF, "T"; exit}'

For your second case, you'd add a second pattern before the one above (to catch it first):

awk '/%geom Scan .*=/{print $5, $(NF-2), $(NF-1), $NF, "T"; exit}
     /[0-9]\s*=/{print $3, $(NF-2), $(NF-1), $NF, "T"; exit}'

Upvotes: 2

Related Questions