Reputation: 920
I need to convert a list of IDs from using a delimiter consisting of ,
and/or \r\n
or \n
to using ,|
. (essentially: s/[,\r\n]+/,\|/g
without a trailing |
)
Example input data:
123,456,789,012
or
123,
456
789,
012
and I need the resulting output to be 123,|456,|798,|012,
: a comma ending each field, and a pipe separating them.
This seems really simple to do, but I'm quite stumped on how to manage this. I've tried ... quite a few ways, actually, but nothing seems to work. Here are a few examples:
sed "s/[,\r\n]+/,\|/g" < filename
does not match any of the delimiters.
sed "s/(,|,?\r?\n?)/,\|/g"
does not match anything either.
tr -t "(,?(\r|\n)+)" ",\|"
and tr -t "[,\r\n]+" ",\|"
only replace ,
tr "(,|\r?\n)" ",\|"
works correctly with ,
but with ,\n
and ,\r\n
it replaces the matched characters with multiple bars. Ex: 123|||456|||789|||012|
Getting more complex: sed ':a;N;$!ba;s/\n/,/g"
(Taken from here) replaces \n
correctly with ,
but does not work with \r\n
. Replacing the \n
with [,\r\n]
simply returns the input.
I'm stumped. Can anyone offer some help or advice on this?
Upvotes: 4
Views: 18031
Reputation: 6552
What I do is normalize the \r\n sequence to \n to get rid of one alternative (and increase the speed of the next step).
perl -pi -e 'BEGIN { $/ = undef; } s/\r\n/\n/g; s/[,\n]/,|/g;'
Update: from your examples, it looks like you meant to replace multiple occurrences of delimiters with nothing in between them with a single occurence of ,| If that is what you want to do, then change the command to this:
perl -pi -e 'BEGIN { $/ = undef; } END { print ",\n"; } s/\r\n/\n/g; s/[,\n]+/,|/g;'
Also, you want a trailing , after the last field.
Upvotes: 0
Reputation: 754530
From your sample output, it seems that the output doesn't have a pipe at the end; you have ,
marking the end of each field, and |
separating pairs of fields. For that specification, this works with tr
and sed
:
$ x="123,
> 456
> 789,
> 012"
$ echo "$x" | tr -s '\r\n' ',' | sed 's/,\(.\)/,|\1/g'
123,|456,|789,|012,
$
The tr
command replaces newline and carriage return with comma, squeezing (-s
) duplicates. The sed
command looks for a comma followed by another character and replaces it with ,|
.
Upvotes: 3