Eofet
Eofet

Reputation: 335

awk print line as is (with spaces)

I am trying to modify a line based on a certain condition, and then print into a new file. Unfortunately, the file must be sensitive to a number of spaces between the columns. The typical line looks like this:

ATOM     301 H    UREA    24    5.966    3.408    1.877   1.00   0.00 UREA  N

Here is the command I use:

awk '{if ($4 == "UREA" && $2%2 == 1) sub("H","TH",$3);print $0;}' origin.dat > final.dat

Basically, I want awk to print exactly the same line (with the same number of spaces) but with a substituted third column. What it prints is:

ATOM 301 TH UREA 24 5.966 3.408 1.877 1.00 0.00 UREA H

I know I could use printf or a very long print statemnt, but with the number of columns the file has it can be cumbersome. Is there an elegant way to print a line after substitution as is? Thanks!

Upvotes: 0

Views: 1976

Answers (4)

twalberg
twalberg

Reputation: 62369

If you are using GNU awk (and possibly some other versions as well), there is support for using fixed-width fields instead of delimiter-based fields. Read through man awk for more information but your awk invocation would look something like:

awk 'BEGIN{FIELDWIDTHS="10 5 8 3 ..."}{....}'

Setting the FIELDWIDTHS variable at the beginning of the program, using a space-separated list of numbers, causes awk to split each line based on those values instead of on spaces (or other delimiters)...

EDIT: Here's an example using the original data, although I've had to guess on some of the field widths, because the question doesn't specify them, and I'm too lazy to count them, assuming what was typed is even exactly representative of the actual data... I've assumed that all spaces are trailing the preceding field, which may not actually be the case...

$ echo "ATOM     301 H    UREA    24    5.966    3.408    1.877   1.00   0.00 UREA  N" |\
  awk 'BEGIN{OFS=""; FIELDWIDTHS="9 4 5 8 100"} $4 ~ /^UREA/ && $2 % 2 {sub("H ", "TH", $3); print}'
ATOM     301 TH   UREA    24    5.966    3.408    1.877   1.00   0.00 UREA  N

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203169

Modifying a field WILL cause the record to be recompiled using the OFS value as the separator. You need to modify the whole record instead using an RE interval:

$ awk '$4=="UREA" && $2%2{$0=gensub(/((\S+\s+){2})\S+/,"\\1TH","")}1' file
ATOM     301 TH    UREA    24    5.966    3.408    1.877   1.00   0.00 UREA  N

The above uses GNU awk for gensub(), \S, and \s.

Upvotes: 2

tripleee
tripleee

Reputation: 189317

If you modify the positional parameters, Awk will reassemble the line. But if it's a file with fixed-width columns, you should be able to figure out which positions within the line to modify, so you don't need to modify the positional parameters.

This is not particularly elegant, but it preserves your spacing:

awk '$4 == "UREA" && $2%2 == 1 { print substr($0, 1, 13) "TH" substr($0, 15) }'

Upvotes: 3

Tom Fenech
Tom Fenech

Reputation: 74596

When you access the third field, $3, the original formatting is lost. The following approach may have undesired consequences depending on the values in your other fields but one way to solve the problem is to perform the sub on the whole record:

awk '$4=="UREA" && $2%2{sub(/H/,"TH");print}' file

Remember that sub only performs one substitution, so there will only be side-effects if the first or second column can contain "H". Depending on your version of awk, you could make the regex more specific using word boundaries, for example. Note that I have used /H/ as the first argument to sub, rather than "H", as saves awk from converting the string to a regex.

As an aside, I've removed your usage of if as the structure of an awk program is condition { action }. I've also removed the == 1 from your condition as a number % 2 is either true (1) or false (0).

Output:

ATOM     301 TH    UREA    24    5.966    3.408    1.877   1.00   0.00 UREA  N

Upvotes: 2

Related Questions