Alam
Alam

Reputation: 325

Bash code for Selecting few columns from a variable

In a file I have a list of coordinates stored (see figure, to the left). From there I want to copy the coordinates only (red marked) and put them in another file.

I copy the correct section from the file using COORD=`grep -B${i} '&END COORD' ${cpki_file}. Then I tried to use awk to extract the required numbers from the COORD variable . It does output all the numbers in the file but deletes the spaces between values (figure, to the right).

How to write the red marked section as they are?

enter image description here

N=200
NEndCoord=`grep -B${N} '&END COORD' ${cpki_file}|wc -l`
NCoord=`grep -B${N} '&END COORD' ${cpki_file}| grep -B200  '&COORD' |wc -l`
let i=$NEndCoord-$NCoord

COORD=`grep -B${i} '&END COORD' ${cpki_file}`

echo "$COORD" | awk '{ print $2 $3  $4 }'
echo "$COORD" | awk '{ print $2 $3  $4 }'>tmp.txt

Upvotes: 1

Views: 81

Answers (2)

Léa Gris
Léa Gris

Reputation: 19685

sed one-liner:

sed -n '/^&COORD$/,/^UNIT/{s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)/\1\t\2\t\3/p}' <infile.txt >outfile.txt

Explanation:

Invocation:

  • sed: stream editor
    • -n: do not print unless eplicit

Commands in sed:

  • /^&COORD$/,/^UNIT/: Selects groups of lines after &COORDS and before UNIT.
  • {s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)/\1\t\2\t\3/p}: Process each selected lines.
    • s/.*[[:space:]]\+\(.*\)[[:space:]]\+\(.*\)[[:space:]]\+\(.*\): Regex capture space delimited groups except the first.
    • /\1\t\2\t\3/: Replace with tab delimited values of the captured groups.
    • p: Explicit printout.

Upvotes: 0

kvantour
kvantour

Reputation: 26591

When you start using combinations of grep, sed, awk, cut and alike, you should realize you can do it all in a single awk command. In case of the OP, this would do exactly the same:

awk '/[&]END COORD/{p=0}
     p { print $2,$3,$4 }
     /[&]COORD/{p=1}' file

This parses the file keeping track of a printing flag p. The flag is set if "&COORD" is found and unset if "&END COORD" is found. Printing is done, only when the flag p is set. Since we don't want to print the line with "&END COORD", we have to reset the flag before we do the check for the printing. The same holds for the line with "&COORD", but there we have to reset it after we do the check for the printing (its a bit a weird reversed logic).

The problem with the above is that it will also process the lines


UNIT angstrom

If you want to have these removed, you might want to do a check on the total columns:

awk '/[&]END COORD/{p=0}
     p && (NF==4){ print $2,$3,$4 }
     /[&]COORD/{p=1}' file

Of only print the lines which do not contain "UNIT" or are empty:

awk '/[&]END COORD/{p=0}
     p && (NF>0) && ($1 != "UNIT"){ print $2,$3,$4 }
     /[&]COORD/{p=1}' file

Upvotes: 2

Related Questions