user1292603
user1292603

Reputation: 327

Extract multiple occurrences on the same line using sed/regex

I am trying to loop through each line in a file and find and extract letters that start with ${ and end with }. So as the final output I am expecting only SOLDIR and TEMP(from inputfile.sh).

I have tried using the following script but it seems it matches and extracts only the second occurrence of the pattern TEMP. I also tried adding g at the end but it doesn't help. Could anybody please let me know how to match and extract both/multiple occurrences on the same line ?

inputfile.sh:

.  
.  
SOLPORT=\`grep -A 4 '\[LocalDB\]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`  
.  
.  

script.sh:

infile='inputfile.sh'  
while read line ; do    
  echo $line | sed 's%.*${\([^}]*\)}.*%\1%g'  
done < "$infile"  

Upvotes: 12

Views: 9003

Answers (3)

torbiak
torbiak

Reputation: 146

Extracting multiple matches from a single line using sed isn't as bad as I thought it'd be, but it's still fairly esoteric and difficult to read:

$ echo 'Hello ${var1}, how is your ${var2}' | sed -En '
    # Replace ${PREFIX}${TARGET}${SUFFIX} with ${PREFIX}\a${TARGET}\n${SUFFIX}
    s#\$\{([^}]+)\}#\a\1\n#
    # Continue to next line if no matches.
    /\n/!b
    # Remove the prefix.
    s#.*\a##
    # Print up to the first newline.
    P
    # Delete up to the first newline and reprocess what's left of the line.
    D
'
var1
var2

And all on one line:

sed -En 's#\$\{([^}]+)\}#\a\1\n#;/\n/!b;s#.*\a##;P;D'

Since POSIX extended regexes don't support non-greedy quantifiers or putting a newline escape in a bracket expression I've used a BEL character (\a) as a sentinel at the end of the prefix instead of a newline. A newline could be used, but then the second substitution would have to be the questionable s#.*\n(.*\n.*)##, which might involve a pathological amount of backtracking by the regex engine.

Upvotes: 1

Lev Levitsky
Lev Levitsky

Reputation: 65791

May I propose a grep solution?

grep -oP '(?<=\${).*?(?=})'

It uses Perl-style lookaround assertions and lazily matches anything between '${' and '}'.

Feeding your line to it, I get

$ echo "SOLPORT=\`grep -A 4 '[LocalDB]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`" | grep -oP '(?<=\${).*?(?=})'
SOLDIR
TEMP

Upvotes: 14

Zsolt Botykai
Zsolt Botykai

Reputation: 51613

This might work for you (but maybe only for your specific input line):

sed 's/[^$]*\(${[^}]\+}\)[^$]*/\1\t/g;s/$[^{$]\+//g'

Upvotes: 2

Related Questions