Lazao
Lazao

Reputation: 225

Regex: How to match pattern only once?

I'm trying to extract data from a .config file (generated by using kconfig). The default format is:

SYMBOL=y (in case of a bool)
SYMBOL="str" (in case of a string)

I did managed to get it working with the following regex:

sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/g' configfile > formattedfile

It is working for any case except for this one:

SYMBOL="http://my.domain/toast?id=150"

As a result, I have in my output file:

def SYMBOL="http://my.domain/toast?id "SYMBOL="http://my.domain/toast?id="

Because the pattern XXX=XXX appears twice in this line. How can I avoid this please ?

Regards,

Upvotes: 3

Views: 3111

Answers (3)

Jens
Jens

Reputation: 72707

The problem is that .+ is greedy: it tries to match the longest possible string. This extends to the second =. Since identifiers can't contain a = character, it is best to be more specific in matching the first part:

sed -e '/^#/d;s/^\([^=]*\)=\(.*\)/def \1 \2\n/' configfile > formattedfile

Note that I changed the second \1 to \2 since I think this is what you meant. I also avoided the extended regular expression quantifier + in favor of the basic regular expression quantifier * which is more portable.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174786

You need to escape the + symbol and also turn the first .+ to [^=]\+ because .+ is greedy and matches upto the last = symbol.

$ sed -e '/^#/d;s/\([^=]\+\)=\(.\+\)/def \1 "\1"\n/g' file
def SYMBOL "SYMBOL"

def SYMBOL "SYMBOL"

def SYMBOL "SYMBOL"

Upvotes: 1

fedorqui
fedorqui

Reputation: 290055

Just drop the g in your command:

sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/'
                                            ^

instead of

sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/g'
                                            ^

From info sed:

`g'
     Apply the replacement to _all_ matches to the REGEXP, not just the
     first.

See another example:

$ echo "hello" | sed 's/l/X/'   #without g
heXlo
$ echo "hello" | sed 's/l/X/g'  #with g
heXXo

Upvotes: 0

Related Questions