Regular expression to remove commas after the first

Question

I have a file that looks like:

16262|John, Doe|John|Doe|JD|etc...

I need to find and replace cases as:

16262|John, Doe, Dae|John|Doe Dae|JD|etc...

by

16262|John, Doe Dae|John|Doe Dae|JD|etc...

In summary, I want to alter in the second field the commas after the first (may be more than one after).

Any suggestion?

Casimir et Hippolyte · Accepted Answer

With gnu sed:

BRE syntax:

sed 's/$\(^\||$[^|,]*,\) \?\|, \?/\1 /g;'

ERE syntax:

sed -r 's/((^|\|)[^|,]*,) ?|, ?/\1 /g;'

details:

(          # group 1: all the begining of an item until the first comma
    (      # group 2:
        ^  # start of the line
      |    # OR
        \| # delimiter
    )
    [^|,]* # start of the item until | or ,
    ,      # the first comma
)          # close the capture group 1
[ ]?       # optional space
|        # OR  
,          # an other comma
[ ]?

When the first branch succeeds, the first comma is captured in the group 1 with all the begining of the item, since the replacement string contains a reference to the capture group 1 (\1), so the first comma stay unchanged.

When the second branch succeeds the group 1 is not defined and the reference \1 in the replacement string is an empty string. This is why other commas are removed.

Regular expression to remove commas after the first

Answers (2)

Related Questions