Reputation: 862
I'm working with many strings like with this structure:
=Cluster=
SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 0.9988012901749596
SPEC PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;spectrum=752 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 0.9988012901749596
Due to a bug in the program that generate the files, sometimes extra semicolons appear where should be just one, and appear where they should not appear. For example:
=Cluster=
SPEC PRD000681;;;;;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 ; 0.9988012901749596
SPEC PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;;;;spectrum=752 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 ; 0.9988012901749596
In order to fix this I am using regular expression s/;+/;/g;
or awk '{gsub(/[;]+/,";")}1'input > output
but I have no idea how I remove remove the last semicolon without affecting the first ones.
One good output would be something like this:
=Cluster=
SPEC PRD000681;PRIDE_Exp_Complete_Ac_22493.xml;spectrum=4691 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 0.9988012901749596
SPEC PRD000681;PRIDE_Exp_Complete_Ac_22495.xml;spectrum=752 true LHDEEIQELQAQIQEQHVQIDMDVSKPDLTAALR 3940.8833 1 9913 0.9988012901749596
My question is: How could I remove the last semicolon without affecting the first semicolons?
Upvotes: 3
Views: 3948
Reputation: 290165
Using How do I replace the last occurrence of a character in a string using sed? you can say:
sed -r 's/(.*);/\1/' file
That is, match everything with .*
until the last ;
is found. This works because sed is very greedy and will slurp everything until the last ;
is found.
Together with your initial expression, you will have:
sed -re 's/;+/;/g' -e 's/(.*);/\1/' file
Since your input file has so much data, it is hard to see the output. See it live with some dummy data:
$ cat file
hello;;;;;how;are;you
i;am;fine
Just remove the last semi colon:
$ sed -r 's/(.*);/\1/' file
hello;;;;;how;areyou
i;amfine
Remove the last semi colon and squeeze multiple semi colons:
$ sed -re 's/;+/;/g' -e 's/(.*);/\1/' file
hello;how;areyou
i;amfine
Upvotes: 6
Reputation: 37454
Using rev
and awk (and @fedorqui's example):
$ rev file | awk '{ sub(/;/, "") }1' | rev
hello;;;;;how;areyou
i;amfine
Use rev
to reverse the records, delete the first ;
with sub
instead and rev
the records again. You can use gsub
first to replace multiple ;
s with one:
$ rev file | awk '{ gsub(/\;+/, ";"); sub(/;/, "") }1' | rev
hello;how;areyou
i;amfine
Upvotes: 1