Thanos
Thanos

Reputation: 586

Extract data from non column file (in awk)

I am trying to extract some specific values from a non column files. The files have the format

 16O     ADOPTED LEVELS, GAMMAS        1993TI07                  93NP     199902
 16O   L 0.0          0+               STABLE                                   
 16O 2 L ISPIN=0                                                                
 16O 3 L XREF=ABCDEFHIJKLMNOPQ                                                  
 16O   L 6049.4    10  0+              67 PS     5                              
 16O 2 L ISPIN=0                                                                
 16O 3 L XREF=ABCEFIJKMP                                                        
 16O   G 6048.2    10          [E0]                             100             
 16O   L 6129.89   4   3-              18.4 PS   5                              
 16O 2 L ISPIN=0$ MOMM1=+1.668 12 (1989RA17)                                    
 16O 3 L XREF=ABCEFHIJKLNOPQ                                                    
 16O   G 6128.63   4  100      [E3]                                             
 16O 2 G BE3W=13.5 7                                                            

I am interested in the values after the sequence 16O L. For instanse 0.0, 6049.4, 6129.89 etc. In general the values that I want to extract from those files are after the sequence (Number)(Element)(spaces)L(space).

The tricky thing is that if the (Element) consists of one letter there are 3 spaces. if the (Element) consists of two letters there are 2 spaces. An example file is

 10BE    ADOPTED LEVELS, GAMMAS        2004TI06                  04NP     200705
 10BE  L 0.0         0+                1.51E+6 Y 4                              
 10BE2 L ISPIN=1 $ %B-=100                                                      
 10BE3 L XREF=ABDEFIJKLMNOPQSTUVWXYZabceghij                                    
 10BE cL T         from weighted average of T{-1/2}=1.51 Ma 6 (Hofmann et al.,  
 10BE2cL Nucl. Instrum. Meth. Phys. Res. |b 24-25 (1987) 276),                  
 10BE3cL T{-1/2}=1.53 Ma 5% (1993Mi26), and T{-1/2}=1.48 Ma 5% (1993Mi26).      
 10BE  L 3368.03   3 2+                125 FS    12                             
 10BE2 L ISPIN=1 $ %IT=100                                                      
 10BE3 L XREF=ABCDEFIJKLMNOPQRSTUVWXYZabceghij                                  
 10BE cL           B(E2)=52 e{+2} fm{+4} 6 (1987Ra01).                          
 10BE cL E         from {+9}Be(n,|g) (1983Ke11). Other value: 3368.34 keV {I43} 
 10BE2cL (1999Bu26).                                                            
 10BE2 L WIDTHG=3.66E-3 EV 35                                                   
 10BE  G 3367.415  30 100      E2                                               
 10BE2 G WIDTHG=3.66E-3 EV 35$BE2W=8.00 76                                      
 10BE  L 5958.39   5 2+                55 FS     LT                             
 10BE2 L ISPIN=1 $ %IT=100                                                      
 10BE3 L XREF=DFJKLMPRTUWYbeghi                                                 
 10BE cL E         from {+9}Be(n,|g) (1983Ke11). Other value: 5958.3 keV {I3}   
 10BE2cL (1969Al17).                                                            
 10BE  G 2589.999  60 90     GTM1                                               
 10BE  G 5955.9     5 10     LTE2                                               
 10BE  L 13.05E3   10                  290 KEV   130                        A   
 10BE2 L %A GT 0                                                                
 10BE3 L XREF=E                                                                 
 10BE cL E         |G: from {+7}Li({+7}Li,|a+{+6}He) (2001Cu06).

Is there a way to get those values using awk? Is there another language for these kinds of jobs?

I used

awk '/   L/ { print $3 } ' file

for the first filetype(i.e. {3spaces}L) and it works. I used

awk '/  L/ { print $3 } ' file

for the second filetype(i.e. {2spaces}L) and it gives weird results(i.e. it prints values after the sequence (two spaces)G and I cannot understand why. The only way it can work is to use

awk '/  L / { print $3 } ' file

(i.e. one extra space after L). Why is this happening for the second filetype? Is there a way to use one code for both filetypes?

Upvotes: 1

Views: 184

Answers (3)

BMW
BMW

Reputation: 45293

Using awk

awk '/[0-9]+[A-Z] {3}L / { print $3 } ' file

or

awk '$1~/[0-9]+[A-Z]/&&$2=="L"{print $3}' file

Using grep

grep -iPo '\d+[A-Z] {3}L \K[\d.]*' file

Upvotes: 1

Gaurav
Gaurav

Reputation: 114

Are you looking for the value present in the line "160 L" If thats the case this should do the job

awk '/16O   L/ { print $3 } ' filename

Upvotes: 1

Kent
Kent

Reputation: 195209

when I saw this question, I thought it would be an easy grep line, I was wrong!! I test at least 10 times with my grep line, it didn't work! finally I found out why. "sh*t!"

the data in your example:

16O ....

I was thinking they were :

160 ....

see the difference? :(

ok, here is the line:

grep -Po '^16O {3}L \K[\d.]*' file

it outputs:

0.0
6049.4
6129.89
6917.1
7116.85
8871.9
9585
9844.5
10356
10957
11080
11096.7
11260
11520
11600
12049
12440
12530
....

if you want it to be in your "general" way:

grep -Po '^\d\d[A-Z] {3}L \K[\d.]*'

Upvotes: 0

Related Questions