Reputation: 560
I'm a beginner with regexes and I'm trying to achieve something relatively simple:
I have a dataset arranged like this:
1,AAA,aaaa,BBB,bbbbbb ...
2,AAA,aaaaaaa,BBB,bbb ...
3,AAA,aaaaa,BBB,bb ...
I'm looking into adding curly brackets to the strings of various length (alphanumeric chars) following AAA or BBB (these are constant):
1,AAA,{aaaa},BBB,{bbbbbb} ...
2,AAA,{aaaaaaa},BBB,{bbb} ...
3,AAA,{aaaaa},BBB,{bb} ...
So I have tried with sed this way:
sed 's/(AAA|BBB)[[:punct:]].[[:alnum:]]/\1{&}/g' dataset.txt
However I got this result:
1,AAA,{AAA,aa}aa,BBB,{BBB,bb}bbbb, ...
2,AAA,{AAA,aa}aaaaa,BBB,[BBB,bb}b, ...
3,AAA,{AAA,aa}aaa,BBB,{BBB,bb} ...
Obvisouly, the &
in the replace part of sed
is going to be the matched pattern, however, I would like &
to be only what is after the matched patter, what am I doing wrong?
I have also tried adding word boundaries, after [^ ]
to no avail. Am I trying too hard with sed
? Should I use a language that allows lookbehind instead?
Thanks for any help!
Upvotes: 2
Views: 2817
Reputation: 1420
Try this:
sed 's/\(AAA\|BBB\),\([^,]*\)/\1,{\2}/g' dataset.txt
Upvotes: 1
Reputation: 784898
Following sed should work.
On Linux:
sed -i.bak -r 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'
OR on OSX:
sed -i.bak -E 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'
-i
is for inline option to save changes in the input file itself.
Upvotes: 1
Reputation: 213193
You can always have more than 1 capture groups in your regex, to capture different parts. You can even move the [:punct:]
part inside the first capture group:
sed 's/((?:AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g' dataset.txt
I don't understand what that .
in between [:punct:]
and [:alnum:]
was doing. So, I removed it. Because of that, you might have noticed that, the regex was matching the following pattern:
{AAA,aa}
{BBB,bb}
i.e, it was matching just 2 characters after AAA
and BBB
. One for .
and one for [[:alnum:]]
.
To match all the alphanumeric characters after ,
till the next ,
you need to use quantifier: [[:alnum:]]+
Upvotes: 1