Chatul
Chatul

Reputation: 13

string manipulation within file column in unix

I have a large tab delimited txt file which contains 22 columns and up to 10^6 lines. Column 7 of the file is an 11 character string which I need to edit as follows: the last 5 characters (chr 7-11) need to be the first 5 characters.

For example, current file looks like:

col1a col2a col3a col4a col5a col6a XXXXXXAAAAA col8a ...
col1b col2b col3b col4b col5b col6b XXXXXXBBBBB col8b ...
col1c col2c col3c col4c col5c col6c XXXXXXCCCCC col8c ...
col1d col2d col3d col4d col5d col6d XXXXXXDDDDD col8d ...
....

The desired output is:

col1a col2a col3a col4a col5a col6a AAAAAXXXXXX col8a ...
col1b col2b col3b col4b col5b col6b BBBBBXXXXXX col8b ...
col1c col2c col3c col4c col5c col6c CCCCCXXXXXX col8c ...
col1d col2d col3d col4d col5d col6d DDDDDXXXXXX col8d ...
....

It seems to me that one way of doing this is to cut the relevant column to two using cut, then combining them again using perhaps paste? So far I have only managed doing this in multiple steps (original file name is short):

1) Using awk and cut to create two new files, one for each half of the column

awk ' BEGIN { FS="\t"; OFS="\t" } {print $7} ' short | cut -c1-6 > file1
awk ' BEGIN { FS="\t"; OFS="\t" } {print $7} ' short | cut -c7-11 > file2

2) Using paste to paste them back together

paste -d "" file2 file1 > file12

3) Using paste to paste new file to original

paste -d"\t" short file12 > shortCom

4) Using 'awk' to replace original column 7 with new one:

awk ' BEGIN { FS="\t"; OFS="\t" } {
$7 = $23
print $0 } ' shortCom

This is obviously a very long and cumbersome process to do something that I suspect is actually quite simple... I would be very grateful for any advice you may have on improving this, in order to make this quicker and more efficient.

Thanks!!

Upvotes: 1

Views: 2159

Answers (1)

jaypal singh
jaypal singh

Reputation: 77105

This should work:

awk '{y=substr($7,1,5);z=substr($7,6); $7=z""y;}1' inputfile

If you have gnu awk then:

gawk '{$7=gensub(/(.{5})(.{6})/ , "\\2\\1" , "g" , $7)}1' inputfile

Upvotes: 1

Related Questions