In Bash, modifying columns and rows in a file

Question

I have some files that are named as follows:

 d_Ca-1_O_7.dat
 d_Ca-1_O_8.dat
 d_Ca-1_O_14.dat
 d_Ca-1_O_16.dat
 d_Ca-1_O_10.dat

In each of these files I have this structure:

 abcA_BCdef  1 G   1     2.4733     4.6738    7 O    0 0 0
 ghiJ_KLmno  1 P   1     2.4811     4.6887    7 O    0 0 0
 pqrS_TLxyz  1 L   1     2.4872     4.7000    7 O    0 0 0
 ... 
 (the same scheme)

I would like to make a bash script that goes over these files, something like:

for {i = 7, 8, 14, 16} in d_Ca-1_O_i.dat

and converts each file to this format:

 A.BC     2.4733     #  0 0 0
 J.KL     2.4811     #  0 0 0
 S.TL     2.4872     #  0 0 0
 ... 
 (the same scheme)

In which in every line:

1) First column: we reduce the same bit of the beginning, the same bit of the end

2) First column: replace a _ by a .

2) Remove 2nd, 3rd, 4th, 6th, 7th, 8th columns

4) add a # at the beginning of each line of 9th column

I would appreciate very much some help

Lars Fischer · Accepted Answer

Assuming that your input is tab separated, here is a GNU Awk script:

script.awk:

BEGIN { OFS=FS="	"}
      { strange = gensub(/^.*(.)_(..).*$/,"\1.\2","",$1)
        print strange, $5, "#" $9 }

Use it like this inside your for loop in your bash: awk -f script.awk yourfile

E.g. something like:

for i in 7 8 14 16 
do 
  awk -f script.awk "d_Ca-1_O_${i}.dat"
done

For the transformation of the first field, the script takes one char to the left and two chars to the right of an underscore. The underscore is converted to a dot, all other chars from field one are discarded.

In Bash, modifying columns and rows in a file

Answers (1)

Related Questions