Reputation: 862

How can I remove the last semicolon in a string?

I'm working with many strings like with this structure:

=Cluster=
SPEC    PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1   9913        0.9988012901749596
SPEC    PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;spectrum=752  true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1   9913        0.9988012901749596

Due to a bug in the program that generate the files, sometimes extra semicolons appear where should be just one, and appear where they should not appear. For example:

=Cluster=
SPEC    PRD000681;;;;;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1    9913   ;   0.9988012901749596
SPEC    PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;;;;spectrum=752   true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1    9913   ;   0.9988012901749596

In order to fix this I am using regular expression s/;+/;/g; or awk '{gsub(/[;]+/,";")}1'input > output but I have no idea how I remove remove the last semicolon without affecting the first ones.

One good output would be something like this:

=Cluster=
SPEC    PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1   9913        0.9988012901749596
SPEC    PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;spectrum=752  true    LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR  3940.8833   1   9913        0.9988012901749596

My question is: How could I remove the last semicolon without affecting the first semicolons?

Upvotes: 3

Answers (3)

fedorqui

Reputation: 290165

Using How do I replace the last occurrence of a character in a string using sed? you can say:

sed -r 's/(.*);/\1/' file

That is, match everything with .* until the last ; is found. This works because sed is very greedy and will slurp everything until the last ; is found.

Together with your initial expression, you will have:

sed -re 's/;+/;/g' -e 's/(.*);/\1/' file

Since your input file has so much data, it is hard to see the output. See it live with some dummy data:

$ cat file
hello;;;;;how;are;you
i;am;fine

Just remove the last semi colon:

$ sed -r 's/(.*);/\1/' file
hello;;;;;how;areyou
i;amfine

Remove the last semi colon and squeeze multiple semi colons:

$ sed -re 's/;+/;/g' -e 's/(.*);/\1/' file
hello;how;areyou
i;amfine

Upvotes: 6

James Brown

Reputation: 37454

Using rev and awk (and @fedorqui's example):

$ rev file | awk '{ sub(/;/, "") }1' | rev
hello;;;;;how;areyou
i;amfine

Use rev to reverse the records, delete the first ; with sub instead and rev the records again. You can use gsub first to replace multiple ;s with one:

$ rev file | awk '{ gsub(/\;+/, ";"); sub(/;/, "") }1' | rev
hello;how;areyou
i;amfine

Upvotes: 1

Borodin

Reputation: 126742

In Perl

perl -i -pe 's/.*\K;//' myfile

Upvotes: 3

How can I remove the last semicolon in a string?

Answers (3)

Related Questions