Grzenio
Grzenio

Reputation: 36639

sed fails for regular expression

I have a CSV-like formatted file, e.g.:

1,2,3,4,5,6,7,8
2,3,4,5,6,7,8,9

and I am trying to reformat it to get:

A:2/B:4/C:6
A:3/B:5/C:7

so I wrote a little sed script:

sed -r 's/[0-9]+,\([0-9]+\),[0-9]+,\([0-9]+\),[0-9]+,\([0-9]+\).*/A:\1\/B:\2\/C:\3/'

but it reports an error:

sed: -e expression #1, char 92: invalid reference \3 on `s' command's RHS

why doesn't it work and how can I fix it?

Upvotes: 0

Views: 154

Answers (4)

Kaspar Lee
Kaspar Lee

Reputation: 5596

You are escaping the (). \( is very different to (.

When it is escaped, \(, it will match a literal "(" in the string. Capture Groups use () but they must not be escaped.

Therefore, you have no capture groups, and so there is an error back-referncing Capture Group #3, since it does not exist.

You should change this, \([0-9]+\), to this, ([0-9]+):

[0-9]+,([0-9]+),[0-9]+,([0-9]+),[0-9]+,([0-9]+).*

Live Demo on Regex101


A More Efficient RegEx

Your RegEx was quite innefficient. It can be shorted to this:

\d+,(\d+),\d+,(\d+),\d+,(\d+).*
# VS #
[0-9]+,([0-9]+),[0-9]+,([0-9]+),[0-9]+,([0-9]+).*

And you can use the same substitute statement.

The reason this works is because \d is a shorthand method of writing [0-9], it is 3 digits shorter (and considering the number of times you have written [0-9], this saves a lot of space)

Live Demo on Regex101

Upvotes: 2

hek2mgl
hek2mgl

Reputation: 157947

I would use awk:

awk -F, '{printf "A:%s/B:%s/C:%s\n", $2, $4, $6}' file

Using -F, allows to specify the field delimiter and splits the input lines by a comma. printf reassembles the output like you desire.

Upvotes: 2

user2705585
user2705585

Reputation:

Problem seems to be content in regex like this \([0-9]+\). Here you are escaping ( and ), so it's not actually a capturing group and hence cannot be referenced back.

Try ([0-9]+) or (\d+).

Upvotes: 1

choroba
choroba

Reputation: 241768

With -r, regular expressions use the "extended" syntax, under which capturing parentheses shouldn't be quoted.

Upvotes: 2

Related Questions