adrien
adrien

Reputation: 560

Add curly braces to string after a match (sed)

I'm a beginner with regexes and I'm trying to achieve something relatively simple:

I have a dataset arranged like this:

1,AAA,aaaa,BBB,bbbbbb ...
2,AAA,aaaaaaa,BBB,bbb ...
3,AAA,aaaaa,BBB,bb ...

I'm looking into adding curly brackets to the strings of various length (alphanumeric chars) following AAA or BBB (these are constant):

1,AAA,{aaaa},BBB,{bbbbbb} ...
2,AAA,{aaaaaaa},BBB,{bbb} ...
3,AAA,{aaaaa},BBB,{bb} ...

So I have tried with sed this way:

sed 's/(AAA|BBB)[[:punct:]].[[:alnum:]]/\1{&}/g' dataset.txt

However I got this result:

1,AAA,{AAA,aa}aa,BBB,{BBB,bb}bbbb, ... 
2,AAA,{AAA,aa}aaaaa,BBB,[BBB,bb}b, ...
3,AAA,{AAA,aa}aaa,BBB,{BBB,bb} ...

Obvisouly, the & in the replace part of sed is going to be the matched pattern, however, I would like & to be only what is after the matched patter, what am I doing wrong?

I have also tried adding word boundaries, after [^ ] to no avail. Am I trying too hard with sed? Should I use a language that allows lookbehind instead?

Thanks for any help!

Upvotes: 2

Views: 2817

Answers (3)

user1502952
user1502952

Reputation: 1420

Try this:

sed 's/\(AAA\|BBB\),\([^,]*\)/\1,{\2}/g' dataset.txt

Upvotes: 1

anubhava
anubhava

Reputation: 784898

Following sed should work.

On Linux:

sed -i.bak -r 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

OR on OSX:

sed -i.bak -E 's/((AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g'

-i is for inline option to save changes in the input file itself.

Upvotes: 1

Rohit Jain
Rohit Jain

Reputation: 213193

You can always have more than 1 capture groups in your regex, to capture different parts. You can even move the [:punct:] part inside the first capture group:

sed 's/((?:AAA|BBB)[[:punct:]])([[:alnum:]]+)/\1{\3}/g' dataset.txt

I don't understand what that . in between [:punct:] and [:alnum:] was doing. So, I removed it. Because of that, you might have noticed that, the regex was matching the following pattern:

{AAA,aa}
{BBB,bb}

i.e, it was matching just 2 characters after AAA and BBB. One for . and one for [[:alnum:]].

To match all the alphanumeric characters after , till the next , you need to use quantifier: [[:alnum:]]+

Upvotes: 1

Related Questions