J.Carter
J.Carter

Reputation: 331

Regex for replacing space with comma-space, except at end of line

I am trying to covert input file content of this:

NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077

to this:

NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077

i.e., replace a space with comma and space (excluding newline)

For that, I have tried:

perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt

but it gives:

NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077

how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?

Thanks

Upvotes: 8

Views: 2570

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g

It can be done with perl:

perl -pe's/\b\h+/, /g' file

but also with sed:

sed -E 's/\b[ \t]+/, /g' file

Other approach that uses the field separator:

perl -F'\b\h+' -ape'BEGIN{$,=", "}' file

or do the same with awk:

awk -F'\b[ \t]+' -vOFS=', ' '1' file

Upvotes: 4

Niyoko
Niyoko

Reputation: 7672

Try using regex negative lookbehind. It is basically look if the character before the space is colon (:) then it don't match that space.

s/(?<!:)[^\S\n]+/, /g

Upvotes: 10

yoniyes
yoniyes

Reputation: 1030

You were close. That should do the trick:

s/(\d+-\d+)[^\S\n]+/$1, /g

The thing is, I try to look at the parts that will get a comma after them which apply to the pattern of "digits, then a dash, more digits, then a whitespace that's not a newline". The funny thing about it is that I said that "whitespace that's not a newline" part as [^\S\n]+ which means "not a non-whitespace or a newline" (because \S is all that's not \s and we want to exclude the newline too). If in any case you have some trailing whitespace, you can trim it with s/\s+$// prior to the regex above, just don't forget to add the newline character back after that.

Upvotes: 2

Related Questions