chan go
chan go

Reputation: 137

Regular expression to remove commas after the first

I have a file that looks like:

16262|John, Doe|John|Doe|JD|etc...

I need to find and replace cases as:

16262|John, Doe, Dae|John|Doe Dae|JD|etc...

by

16262|John, Doe Dae|John|Doe Dae|JD|etc...

In summary, I want to alter in the second field the commas after the first (may be more than one after).

Any suggestion?

Upvotes: 0

Views: 667

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89629

With gnu sed:

BRE syntax:

sed 's/\(\(^\||\)[^|,]*,\) \?\|, \?/\1 /g;'

ERE syntax:

sed -r 's/((^|\|)[^|,]*,) ?|, ?/\1 /g;'

details:

(          # group 1: all the begining of an item until the first comma
    (      # group 2:
        ^  # start of the line
      |    # OR
        \| # delimiter
    )
    [^|,]* # start of the item until | or ,
    ,      # the first comma
)          # close the capture group 1
[ ]?       # optional space
|        # OR  
,          # an other comma
[ ]?

When the first branch succeeds, the first comma is captured in the group 1 with all the begining of the item, since the replacement string contains a reference to the capture group 1 (\1), so the first comma stay unchanged.

When the second branch succeeds the group 1 is not defined and the reference \1 in the replacement string is an empty string. This is why other commas are removed.

Upvotes: 2

CR Drost
CR Drost

Reputation: 9817

This strongly depends on languages. If you have lookbehind you can do this with the regular expression (?<=,.*),. If you don't have that, for example in JavaScript, you might still be able to use lookahead if you can reverse a string:

String.prototype.reverse = function () {
    return this.split("").reverse().join("");
};
"a, b, c, d".reverse().replace(/,(?=.*,)/g, '').reverse()
// yields "a, b c d"

I don't think there are other features which are quite like lookaround in regex that can easily simulate them. Possibly you can use a more powerful language to capture the index of the first comma, replace all commas, and then reinsert the first comma.

Upvotes: 0

Related Questions