Reputation: 225
I'm trying to extract data from a .config file (generated by using kconfig). The default format is:
SYMBOL=y (in case of a bool)
SYMBOL="str" (in case of a string)
I did managed to get it working with the following regex:
sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/g' configfile > formattedfile
It is working for any case except for this one:
SYMBOL="http://my.domain/toast?id=150"
As a result, I have in my output file:
def SYMBOL="http://my.domain/toast?id "SYMBOL="http://my.domain/toast?id="
Because the pattern XXX=XXX appears twice in this line. How can I avoid this please ?
Regards,
Upvotes: 3
Views: 3111
Reputation: 72707
The problem is that .+
is greedy: it tries to match the longest possible string. This extends to the second =
. Since identifiers can't contain a =
character, it is best to be more specific in matching the first part:
sed -e '/^#/d;s/^\([^=]*\)=\(.*\)/def \1 \2\n/' configfile > formattedfile
Note that I changed the second \1
to \2
since I think this is what you meant. I also avoided the extended regular expression quantifier +
in favor of the basic regular expression quantifier *
which is more portable.
Upvotes: 1
Reputation: 174786
You need to escape the +
symbol and also turn the first .+
to [^=]\+
because .+
is greedy and matches upto the last =
symbol.
$ sed -e '/^#/d;s/\([^=]\+\)=\(.\+\)/def \1 "\1"\n/g' file
def SYMBOL "SYMBOL"
def SYMBOL "SYMBOL"
def SYMBOL "SYMBOL"
Upvotes: 1
Reputation: 290055
Just drop the g
in your command:
sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/'
^
instead of
sed -e '/^#/d;s/\(.+\)=\(.+\)/def \1 "\1"\n/g'
^
From info sed
:
`g'
Apply the replacement to _all_ matches to the REGEXP, not just the
first.
See another example:
$ echo "hello" | sed 's/l/X/' #without g
heXlo
$ echo "hello" | sed 's/l/X/g' #with g
heXXo
Upvotes: 0