Jason R. Mick
Jason R. Mick

Reputation: 5297

Sed Pattern Match Which Uses Offset of Line Number in the Replace String?

I have a file of a column field type standard (where characters 1 through 6 correspond to field 1, 7 through 11 to field 2, and so on).

Key attributes are:

I have a file like:

REMARK   1 
HETATM    1
HETATM    5
HETATM    6
HETATM    7
HETATM    9
HETATM   12
HETATM   15
HETATM   19
HETATM   23
HETATM   27
HETATM   30
HETATM   34
HETATM   38
END

For HETATM records... lines where the first six atoms equal that string... I want to replace the numbers in second field (characters 7 through 11) with the entry number, starting with 1.

i.e. I want the output to appear as:

REMARK   1 
HETATM    1
HETATM    2
HETATM    3
HETATM    4
HETATM    5
HETATM    6
HETATM    7
HETATM    8
HETATM    9
HETATM   10
HETATM   11
HETATM   12
HETATM   13
END

Currently my most concise solution (using a temporary file for testing, to avoid screwing up my original) is:

#!/bin/bash
f=file.pdb
fTmp=${f}.tmp
cp $f $fTmp
for ((l=1; l<$( wc -l $fTmp | awk '{print $1}' ); l++)); do
   sed -i "$((l + 1))"'s#\(HETATM\)[ 0-9]\{5\}#\1'"$( printf '%5s' $l )"'#g' $fTmp
done
cat $fTmp
rm $fTmp

Removing the temporary file baggage this becomes:

f=file.pdb
for ((l=1; l<$( wc -l $f | awk '{print $1}' ); l++)); do
   sed -i "$((l + 1))"'s#\(HETATM\)[ 0-9]\{5\}#\1'"$( printf '%5s' $l )"'#g' $f
done

Seems like there should be some way to use a line number in sed to create a briefer solution -- perhaps a single sed -i command. Assuming that's possible the only complexity is that a bit of arithmetic would be necessary -- the first match which should be set to 1 always occurs on the second line.

I'm hoping there's a sed solution. I'm hesitant to use awk, as given that space padding is important and inline editing is desired, it seems like sed is better choice.

Note once I have an improve solution that's verified working, I'll toss out the *.tmp file stuff, and just operate directly on the target file, hence a single sed -i command could potentially do the job.

Upvotes: 1

Views: 879

Answers (1)

meuh
meuh

Reputation: 12255

If you have GNU awk you can specify that your input is in fixed width fields. For example,

awk -v OFS='' -v FIELDWIDTHS='6 5 6 6 6 6 6' '
/^HETATM/{ $2 = sprintf("%5d",++count) };1' file.pdb

This will edit field 2 of width 5 to an increasing number.

Upvotes: 1

Related Questions