Reputation: 15

Replace string after first semicolon while retaining the string after that

I have a result file, values separated by ; as below:

137;AJP14028.1_VP35;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14037.1_VP35;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14352.1_VP35;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14846.1_VP35;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E

and I want to change the second value (AJP14028.1_VP35) to only AJP14028, without the ".1_VP35" at the back. So the result will be:

137;AJP14028;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14037;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14352;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14846;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E

Any idea on how to do this? I am trying to solve this using either sed or awk but I am not really familiar with them yet.

Upvotes: 0

Answers (4)

potong

Reputation: 58488

This might work for you (GNU sed):

 sed 's/\(;[^.]*\)[^;]*/\1/' file

Make a back reference of the first ; and everything thereafter which is not a . and then remove everything from thereon which is not a ;.

Upvotes: 0

P....

Reputation: 18411

sed -r 's/(^[^.]*)(.[^;]*)(.*)/\1\3/g' inputfile
137;AJP14028;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14037;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14352;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14846;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E

Here: back referencing is used to divide the input line into three groups,seprated by `()'. Later they are referred as "\1" and so on.

The first group will match from the start of the line till the first dot. The second group will match string followed by the first dot till the first semicolon. The third group will match everything followed by it.

Upvotes: 0

dawg

Reputation: 104062

With that input, and focusing on the second field, you can use awk:

$ awk 'BEGIN{FS=OFS=";"} {split($2, arr, /\.1/); $2=arr[1]} 1' file
137;AJP14028;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E 
137;AJP14037;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E 
137;AJP14352;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E 
137;AJP14846;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E

Explanation:

BEGIN{FS=OFS=";"} sets FS and OFS to ";". This splits the input on the ; character and set the output field separator to that same character.
{split($2, arr, /\.1/) splits the second field on the pattern of a literal .1 and places the result in an array.
$2=arr[1] is an awk idiom that resets the second field, $2, to the trimmed value. A side effect is the total record, $0 is reset using the output field separator, OFS
1 at the end is another awkism -- print the current record.

If you just have the fixed string .1_VP35 to remove (and you do not care if it is field specific) you can just used sed:

sed 's/\.1_VP35//' file

Upvotes: 2

Claes Wikner

Reputation: 1517

awk '{sub(/.1_VP35/,"")}1' file

137;AJP14028;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14037;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14352;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E
137;AJP14846;HLA-A*02:01;MVAKYDFLV;0.79200;0.35000;0.87783;0.99826;0.30;<-E

Upvotes: 1

Replace string after first semicolon while retaining the string after that

Answers (4)

Related Questions