sweetandtangy
sweetandtangy

Reputation: 99

How do you select a certain part of a grep output?

I am trying to substitute coordinates of a particular line in one file for the coordinates of a different file. Both of them have a line in them that has "code word" in them and that is where the coordinates are found. The ccordinates are also on the same sets of columns, 33-54, if that helps. How can I label a certain part of the line of interest as a variable so I could use sed to substitute? This is what I have so far:

#!/bin/bash 
FILE=$1 
grep -i "ABC DEF" $FILE.pdb 

# Somehow select the coordinates in the line with "ABC DEF" in $FILE.pdb and label it PDBcoords
PDBcoords=$unknownfunction1

$Somehow select the coordinates in the line with "ABC DEF" in reference.pdb and label it refcoords
grep -i "ABC DEF" reference.pdb
refcoords=$unknownfunction2

sed -i 's/$refcoords/$PDBcoords/' 
wait
echo "Whole Command Done for $FILE"

The grep outputs looks like this:

ATOM   5103  ABC DEF A 100       5.817   2.502 -21.483  1.00 13.63           O

and I only want to select the coordinates

5.817   2.502 -21.483

However, these coordinates change for every file, so I need to label these columns as a variable. Same goes for the reference pdb.

EDIT I came up with this solution:

#!/bin/bash
FILE=$1
PDB=$(grep -i "OXT ORN" $FILE.pdb | cut -c 33-54)
PDBcoords="$(echo "$PDB")"
echo $PDBcoords
echo Found PDB Coordinates for $FILE
pkaSH=$(grep -i "OXT  ORN" pkaSH.pdb | cut -c 33-54)
pkaSHcoords="$(echo "$pkaSH")"
echo $pkaSHcoords
echo Found pkaSH Coordinates for $FILE
sed -i "s/$pkaSHcoords/$PDBcoords/" pkaSH.pdb
echo Command Done

My idea was to redirect the grep output to a temporary file, cut out the coordinate columns, and then define that as a variable with spaces preserved. I'm sure this was overcomplicated, but since it works I think I have my answer.

Upvotes: 0

Views: 1728

Answers (4)

markp-fuso
markp-fuso

Reputation: 35366

Assumptions/Understandings ...

  • OP has mentioned the coordinates are always in columns 33-54 (ie, data is in a fixed-width format as opposed to some sort of delimited format)
  • the sample data shows the coordinates are in columns 36-56 (inclusive)
  • for the sake of this answer I'm going to assume the coordinates reside in columns 33-56 (inclusive; total of 24 columns); this will allow me to use the sample data
  • assuming various non-coordinate columns may have embedded spaces (eg, code word)
  • assuming the search pattern (eg, code name) will only match a single row in each file ($FILE.pdb and reference.pdb)

Sample data (in place of $FILE.pdb I'm using codeword.pdb):

$ cat codeword.pdb
ATOM   5103  something else       23.219  12.880 -78.003  1.00 13.63           O
ATOM   5103  code name A 100       5.817   2.502 -21.483  1.00 13.63           O
ATOM   5103  not this line buddy 105.199 342.192  -1.423  1.00 13.63           O

One idea using grep and cut:

ptn="code name"

grep -i "${ptn}" codeword.pdb | cut -c33-56

This generates:

   5.817   2.502 -21.483

Capturing the output to a variable:

PDBcoords="$(grep -i "${ptn}" codeword.pdb | cut -c33-56)"

echo ".${PDBcoords}."                  # decimals are added as visual delimiters
echo "${#PDBcoords}"                   # number of characters in variable

This generates:

.   5.817   2.502 -21.483.
24

NOTES:

  • the output does contain some leading spaces, for now I'm assuming this is good in case a replacement string is wider, ie, this should ensure columns 33-56 are replaced (assuming, of course, that for all files the coordinates span the same number of columns)
  • OP should be able to use the same code to pull coordinates from reference.pdb for storage in the $refcoords variable
  • OP can change the numbers in this code to match the actual column positions (and widths) for both files $FILE.pdb and reference.pdb

As for the sed portion of OP's code ...

  • at the time I wrote up this answer the sed command is incomplete (I'm assuming the sed target is $FILE.pdb)
  • assuming there could be multiple lines with the same coordinates, we'll need to match on both code name and $PDBcoords

One sed idea:

ptn="Code NAME"                          # mix it up, show case insensitivity
PDBcoords="   5.817   2.502 -21.483"
refcoords=" 103.227  23.285  -1.223"

sed "/${ptn}/Is/${PDBcoords}/${refcoords}/" codeword.pdb

Where:

  • /I - perform case insensitive match
  • s/ .... / .... / - replace old coordinates with new coordinates (assumes the 2 variables (PDBcoords and refcoords) are of the same length in order to maintain column positions in the output)

This generates:

############## before image for sake of comparison:

ATOM   5103  something else       23.219  12.880 -78.003  1.00 13.63           O
ATOM   5103  code name A 100       5.817   2.502 -21.483  1.00 13.63           O
ATOM   5103  not this line buddy 105.199 342.192  -1.423  1.00 13.63           O

############## results of the `sed` command:

ATOM   5103  something else       23.219  12.880 -78.003  1.00 13.63           O
ATOM   5103  code name A 100     103.227  23.285  -1.223  1.00 13.63           O
ATOM   5103  not this line buddy 105.199 342.192  -1.423  1.00 13.63           O

NOTE: Once OP has confirmed this performs the desired modification the -i flag can be added to the sed command to allow for in place updating of $FILE.pdb.

Upvotes: 1

LC-datascientist
LC-datascientist

Reputation: 2096

You can use awk to select columns

grep -i "code name" reference.pdb | awk '{print $7,$8,$9}'

or use cut

grep -i "code name" reference.pdb | tr -s " " | cut -d" " -f 7-9

In both codes, you will be extracting the seventh, eighth, and ninth columns, delimited by white space.

Edit

Reference: How to specify more spaces for the delimiter using cut?

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84642

Another option is:

tr -s ' ' | cut -d ' ' -f 7-9

Where tr -s is used to compress all multiple spaces into a single space and then cut -d ' ' -f 7-9 outputs the space delimited 7th-9th fields, e.g.

$ echo "ATOM   5103  code name A 100       5.817   2.502 -21.483  1.00 13.63           O" | 
tr -s ' ' | cut -d ' ' -f 7-9
5.817 2.502 -21.483

Upvotes: 2

Julien B.
Julien B.

Reputation: 3344

I don't know if all files have the same type of "columns", but if so awk might be what you need

echo ATOM   5103  code name A 100       5.817   2.502 -21.483  1.00 13.63           O | awk '{ print $7, $8, $9 }

# outputs: 5.817 2.502 -21.483 

Upvotes: 0

Related Questions