Reputation: 586
I am trying to extract some data from a multicolumn file named mass.mas03. Columns are seperated by tab
. When the data starts you can see that there are entries that are empty(i.e. line 44 VS line 45).
The whole idea is to extract data from a specific column(MASS EXCESS, that is) for a number of rows, take the sum and create a horizontal line on a xy-axes system, with the y cordinate being the sum of those values.
As an example take lines 41,65,74
-1 0 1 1 H 7288.97050 0.00011 0.0 0.0 B- * 1 007825.03207 0.00010
0 4 4 8 Be 4941.672 0.035 7062.435 0.004 B- -17979.819 1.001 8 005305.103 0.037
4 7 3 10 Li -n 33050.581 15.124 4531.555 1.512 B- 20443.910 15.123 10 035481.259 16.236
I want to do: 7288.97050 + 4941.672 - 33050.581 = - 20819.9385 . If I use an awk
command to have access on the data, I face this problem
> awk '/8 Be/ {print $6}' mass.mas03
4941.672
>awk '/10 Li/ {print $6}' mass.mas03
-n
I don't know if awk
is the best option to do not only the data selection, as well as the "plot" so I am open to suggestions!
An obvious solution would be to use $7
instead of $6
where needed, but it's a large file and I'm trying to automate a bit the whole process.
Is there a way to "chose" the desired data and apply some simple calclations on them, for this weird column file?
Upvotes: 2
Views: 143
Reputation: 11
to me the file is still structured for Fields starts at one point and ends at another a specific field always has the same number of charecters so it is a fixed width fields
$7 starts at 35 and its width is 12
use awk '/Li/ {print substr($0,35,12)}' mass.mas03
Upvotes: 1
Reputation: 204259
if your data is really tab-separated then you'd simply use:
awk -F'\t' '{ whatever }'
If it's fixed width fields instead then you should use:
gawk 'BEGIN{FIELDWIDTHS="list of field widths"} { whatever }'
e.g.:
$ cat file
-1 0 1 1 H 7288.97050 0.00011
0 4 4 8 Be 4941.672 0.035
4 7 3 10 Li -n 33050.581 15.124
$ gawk 'BEGIN{FIELDWIDTHS="4 5 5 5 5 5 12 12"} {print $7}' file
7288.970
4941.672
33050.581
Note that FIELDWIDTHS is gawk-specific. I've no idea if those are the correct field widths or not for your data, I just did a ballpark estimate to demonstrate the approach. Obviously if you want to convert it from fixed-with fields to tab-separated then it's trivial to do that:
$ gawk -v OFS='\t' 'BEGIN{FIELDWIDTHS="4 5 5 5 5 5 12 12"} {$1=$1}1' file
-1 0 1 1 H 7288.97050 0.00011
0 4 4 8 Be 4941.672 0.035
4 7 3 10 Li -n 33050.581 15.124
and if you want to strip off the leading/trailing spaces from each field:
$ gawk -v OFS='\t' 'BEGIN{FIELDWIDTHS="4 5 5 5 5 5 12 12"} {$1=$1; gsub(/ /,"")}1' file
-1 0 1 1 H 7288.97050 0.00011
0 4 4 8 Be 4941.672 0.035
4 7 3 10 Li -n 33050.581 15.124
Upvotes: 2
Reputation: 40778
To replace spaces with tabs, you can try the following:
awk -f f.awk mass.mas03 > mass.mas03.tab
where f.awk
is:
/^1N-Z/ {
f=1
print
getline
print
next
}
f {
gsub(/ +/,"\t")
}
{print}
Upvotes: 2