How to split a column value following a pattern

I am trying to split the fifth column of a .pdb file by adding a space after the chain identifier:

ATOM  12107  N   CYS  D1742     -42.369  73.203 -44.599  1.00224.20      C    N  

So that the output would look like:

ATOM  12107  N   CYS  D 1742     -42.369  73.203 -44.599  1.00224.20      C    N  

The number after the letter changes across the file. I have tried

sed -i 's/D/D /5' test.pdb

without success and I think I should introduce generalized characters to replace the number which follows the letter and introduce that in the command, in order to be able to do this iteratively.

Upvotes: 1

Views: 136

Answers (4)

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed -r 's/(\S)(\S*)/\1 \2/5' file

A column must consist of one or more non-space characters, this places a space between the first and zero or more characters of the fifth column.

Upvotes: 0

James Brown
James Brown

Reputation: 37394

Using GNU awk. Since you did not specify what your field separator is and it seems to be bunches of spaces (yeah, tab most likely), I'm using split to preserve separators to array seps and sub to add the space to the fifth field:

$ awk ' {
    n=split($0,a,FS,seps)  # split record to a, preserve separators to seps, keep n
    sub(/D/,"& ",a[5])     # replace first D with D space (not an add :)
    for(i=1;i<=n;i++)      # iterate all a
        b=b a[i] seps[i]   # gather to buffer b
    print b; b=""          # output and clear b
}' file
ATOM  12107  N   CYS  D 1742     -42.369  73.203 -44.599  1.00224.20      C    N 

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133428

Following awk may help you on same.

awk '{$5=substr($5,1,1) FS substr($5,2)} 1' OFS="\t"  Input_file

In case you need to save the output into same Input_file itself then you could append > temp_file && mv temp_file Input_file in above code too.

Upvotes: 1

karakfa
karakfa

Reputation: 67467

with sed you need to count the fields yourself, but won't normalize the spaces as a side effect.

$ sed -E 's/((\S+\s+){4}.)/\1 /' file

ATOM  12107  N   CYS  D 1742     -42.369  73.203 -44.599  1.00224.20      C    N 

Upvotes: 1

Related Questions