Reputation: 137
I have a file that looks like:
16262|John, Doe|John|Doe|JD|etc...
I need to find and replace cases as:
16262|John, Doe, Dae|John|Doe Dae|JD|etc...
by
16262|John, Doe Dae|John|Doe Dae|JD|etc...
In summary, I want to alter in the second field the commas after the first (may be more than one after).
Any suggestion?
Upvotes: 0
Views: 667
Reputation: 89629
With gnu sed:
BRE syntax:
sed 's/\(\(^\||\)[^|,]*,\) \?\|, \?/\1 /g;'
ERE syntax:
sed -r 's/((^|\|)[^|,]*,) ?|, ?/\1 /g;'
details:
( # group 1: all the begining of an item until the first comma
( # group 2:
^ # start of the line
| # OR
\| # delimiter
)
[^|,]* # start of the item until | or ,
, # the first comma
) # close the capture group 1
[ ]? # optional space
| # OR
, # an other comma
[ ]?
When the first branch succeeds, the first comma is captured in the group 1 with all the begining of the item, since the replacement string contains a reference to the capture group 1 (\1), so the first comma stay unchanged.
When the second branch succeeds the group 1 is not defined and the reference \1 in the replacement string is an empty string. This is why other commas are removed.
Upvotes: 2
Reputation: 9817
This strongly depends on languages. If you have lookbehind you can do this with the regular expression (?<=,.*),
. If you don't have that, for example in JavaScript, you might still be able to use lookahead if you can reverse a string:
String.prototype.reverse = function () {
return this.split("").reverse().join("");
};
"a, b, c, d".reverse().replace(/,(?=.*,)/g, '').reverse()
// yields "a, b c d"
I don't think there are other features which are quite like lookaround in regex that can easily simulate them. Possibly you can use a more powerful language to capture the index of the first comma, replace all commas, and then reinsert the first comma.
Upvotes: 0