Reputation: 335
I am trying to modify a line based on a certain condition, and then print into a new file. Unfortunately, the file must be sensitive to a number of spaces between the columns. The typical line looks like this:
ATOM 301 H UREA 24 5.966 3.408 1.877 1.00 0.00 UREA N
Here is the command I use:
awk '{if ($4 == "UREA" && $2%2 == 1) sub("H","TH",$3);print $0;}' origin.dat > final.dat
Basically, I want awk to print exactly the same line (with the same number of spaces) but with a substituted third column. What it prints is:
ATOM 301 TH UREA 24 5.966 3.408 1.877 1.00 0.00 UREA H
I know I could use printf or a very long print statemnt, but with the number of columns the file has it can be cumbersome. Is there an elegant way to print a line after substitution as is? Thanks!
Upvotes: 0
Views: 1976
Reputation: 62369
If you are using GNU awk
(and possibly some other versions as well), there is support for using fixed-width fields instead of delimiter-based fields. Read through man awk
for more information but your awk
invocation would look something like:
awk 'BEGIN{FIELDWIDTHS="10 5 8 3 ..."}{....}'
Setting the FIELDWIDTHS
variable at the beginning of the program, using a space-separated list of numbers, causes awk
to split each line based on those values instead of on spaces (or other delimiters)...
EDIT: Here's an example using the original data, although I've had to guess on some of the field widths, because the question doesn't specify them, and I'm too lazy to count them, assuming what was typed is even exactly representative of the actual data... I've assumed that all spaces are trailing the preceding field, which may not actually be the case...
$ echo "ATOM 301 H UREA 24 5.966 3.408 1.877 1.00 0.00 UREA N" |\
awk 'BEGIN{OFS=""; FIELDWIDTHS="9 4 5 8 100"} $4 ~ /^UREA/ && $2 % 2 {sub("H ", "TH", $3); print}'
ATOM 301 TH UREA 24 5.966 3.408 1.877 1.00 0.00 UREA N
Upvotes: 3
Reputation: 203169
Modifying a field WILL cause the record to be recompiled using the OFS value as the separator. You need to modify the whole record instead using an RE interval:
$ awk '$4=="UREA" && $2%2{$0=gensub(/((\S+\s+){2})\S+/,"\\1TH","")}1' file
ATOM 301 TH UREA 24 5.966 3.408 1.877 1.00 0.00 UREA N
The above uses GNU awk for gensub(), \S, and \s.
Upvotes: 2
Reputation: 189317
If you modify the positional parameters, Awk will reassemble the line. But if it's a file with fixed-width columns, you should be able to figure out which positions within the line to modify, so you don't need to modify the positional parameters.
This is not particularly elegant, but it preserves your spacing:
awk '$4 == "UREA" && $2%2 == 1 { print substr($0, 1, 13) "TH" substr($0, 15) }'
Upvotes: 3
Reputation: 74596
When you access the third field, $3
, the original formatting is lost. The following approach may have undesired consequences depending on the values in your other fields but one way to solve the problem is to perform the sub
on the whole record:
awk '$4=="UREA" && $2%2{sub(/H/,"TH");print}' file
Remember that sub
only performs one substitution, so there will only be side-effects if the first or second column can contain "H". Depending on your version of awk, you could make the regex more specific using word boundaries, for example. Note that I have used /H/
as the first argument to sub
, rather than "H"
, as saves awk from converting the string to a regex.
As an aside, I've removed your usage of if
as the structure of an awk program is condition { action }
. I've also removed the == 1
from your condition as a number %
2 is either true (1) or false (0).
Output:
ATOM 301 TH UREA 24 5.966 3.408 1.877 1.00 0.00 UREA N
Upvotes: 2