Reputation: 677
I have some files that are named as follows:
d_Ca-1_O_7.dat
d_Ca-1_O_8.dat
d_Ca-1_O_14.dat
d_Ca-1_O_16.dat
d_Ca-1_O_10.dat
In each of these files I have this structure:
abcA_BCdef 1 G 1 2.4733 4.6738 7 O 0 0 0
ghiJ_KLmno 1 P 1 2.4811 4.6887 7 O 0 0 0
pqrS_TLxyz 1 L 1 2.4872 4.7000 7 O 0 0 0
...
(the same scheme)
I would like to make a bash script that goes over these files, something like:
for {i = 7, 8, 14, 16} in d_Ca-1_O_i.dat
and converts each file to this format:
A.BC 2.4733 # 0 0 0
J.KL 2.4811 # 0 0 0
S.TL 2.4872 # 0 0 0
...
(the same scheme)
In which in every line:
1) First column: we reduce the same bit of the beginning, the same bit of the end
2) First column: replace a _
by a .
2) Remove 2nd, 3rd, 4th, 6th, 7th, 8th columns
4) add a #
at the beginning of each line of 9th column
I would appreciate very much some help
Upvotes: 2
Views: 65
Reputation: 10199
Assuming that your input is tab separated, here is a GNU Awk script:
script.awk:
BEGIN { OFS=FS="\t"}
{ strange = gensub(/^.*(.)_(..).*$/,"\\1.\\2","",$1)
print strange, $5, "#" $9 }
Use it like this inside your for loop in your bash: awk -f script.awk yourfile
E.g. something like:
for i in 7 8 14 16
do
awk -f script.awk "d_Ca-1_O_${i}.dat"
done
For the transformation of the first field, the script takes one char to the left and two chars to the right of an underscore. The underscore is converted to a dot, all other chars from field one are discarded.
Upvotes: 2