Reputation: 36639
I have a CSV-like formatted file, e.g.:
1,2,3,4,5,6,7,8
2,3,4,5,6,7,8,9
and I am trying to reformat it to get:
A:2/B:4/C:6
A:3/B:5/C:7
so I wrote a little sed
script:
sed -r 's/[0-9]+,\([0-9]+\),[0-9]+,\([0-9]+\),[0-9]+,\([0-9]+\).*/A:\1\/B:\2\/C:\3/'
but it reports an error:
sed: -e expression #1, char 92: invalid reference \3 on `s' command's RHS
why doesn't it work and how can I fix it?
Upvotes: 0
Views: 154
Reputation: 5596
You are escaping the ()
. \(
is very different to (
.
When it is escaped, \(
, it will match a literal "(
" in the string. Capture Groups use ()
but they must not be escaped.
Therefore, you have no capture groups, and so there is an error back-referncing Capture Group #3, since it does not exist.
You should change this, \([0-9]+\)
, to this, ([0-9]+)
:
[0-9]+,([0-9]+),[0-9]+,([0-9]+),[0-9]+,([0-9]+).*
Your RegEx was quite innefficient. It can be shorted to this:
\d+,(\d+),\d+,(\d+),\d+,(\d+).*
# VS #
[0-9]+,([0-9]+),[0-9]+,([0-9]+),[0-9]+,([0-9]+).*
And you can use the same substitute statement.
The reason this works is because \d
is a shorthand method of writing [0-9]
, it is 3
digits shorter (and considering the number of times you have written [0-9]
, this saves a lot of space)
Upvotes: 2
Reputation: 157947
I would use awk
:
awk -F, '{printf "A:%s/B:%s/C:%s\n", $2, $4, $6}' file
Using -F,
allows to specify the field delimiter and splits the input lines by a comma. printf
reassembles the output like you desire.
Upvotes: 2
Reputation:
Problem seems to be content in regex like this \([0-9]+\)
. Here you are escaping (
and )
, so it's not actually a capturing group and hence cannot be referenced back.
Try ([0-9]+)
or (\d+)
.
Upvotes: 1
Reputation: 241768
With -r
, regular expressions use the "extended" syntax, under which capturing parentheses shouldn't be quoted.
Upvotes: 2