prabodhprakash
prabodhprakash

Reputation: 3927

extract substring using regex in shell script

The strings could be of form:

  1. com.company.$(PRODUCT_NAME:rfc1034identifier)
  2. $(PRODUCT_BUNDLE_IDENTIFIER)
  3. com.company.$(PRODUCT_NAME:rfc1034identifier).$(someRandomVariable)

I need help in writing regex that extract all the string inside $(..)

I created a regex like ([(])\w+([)]) but when I try to execute in shell script, it gives me error of unmatched parenthesis.

This is what I executed:

echo "com.io.$(sdfsdfdsf)"|grep -P '([(])\w+([)])' -o

I need to get all matching substrings.

Upvotes: 1

Views: 6038

Answers (3)

ghoti
ghoti

Reputation: 46886

Your question specifies "shell", but not "bash". So I'll start with a common shell-based tool (awk) rather than assuming you can use any particular set of non-POSIX built-ins.

$ cat inp.txt

com.company.$(PRODUCT_NAME:rfc1034identifier)
$(PRODUCT_BUNDLE_IDENTIFIER)
com.company.$(PRODUCT_NAME:rfc1034identifier).$(someRandomVariable)

$ awk -F'[()]' '{for(i=2;i<=NF;i+=2){print $i}}' inp.txt

PRODUCT_NAME:rfc1034identifier
PRODUCT_BUNDLE_IDENTIFIER
PRODUCT_NAME:rfc1034identifier
someRandomVariable

This awk one-liner defines a field separator that consists of opening or closing brackets. With such a field separator, every even-numbered field will be the content you're looking for, assuming all lines of input are correctly formatted and there are no parentheses embedded inside other parentheses.

If you did want to do this in POSIX shell alone, the following would be an option:

#!/bin/sh

while read line; do
  while expr "$line" : '.*(' >/dev/null; do
    line="${line#*(}"
    echo "${line%%)*}"
  done
done < inp.txt

This steps through each line of input, slicing it up using the parentheses and printing each slice. Note that this uses expr, which most likely an external binary, but is at least included in POSIX.1.

Upvotes: 1

user2317487
user2317487

Reputation:

You can do it quite simple with sed

echo 'com.io.$(asdfasdf)'|sed -e 's/.*(\(.*\))/\1/g'

Gives

asdfasdf

For two fields:

echo 'com.io.$(asdfasdf).$(ddddd)'|sed -e 's/.*((.*)).$((.*))/\1 \2/g'

Gives

asdfasdf ddddd

Explanation:

sed -e 's/.*(\(.*\))/\1/g'
          \_/\____/  \/
           |    |     |_ print the placeholder content
           |    |___ placeholder selecting the text inside the paratheses
           |____ select the text from beginning including the first paranthese    

Upvotes: 2

anubhava
anubhava

Reputation: 786091

Problem is use of double quotes in echo command which is interpreting $(...) as a command substitution.

You can use single quotes:

echo 'com.io.$(sdfsdfdsf)' | grep -oP '[(]\w+[)]'

Here is an alternative using builtin BASH regex:

$> re='[(][^)]+[)]'
$> [[ 'com.io.$(sdfsdfdsf)' =~ $re ]] && echo "${BASH_REMATCH[0]}"
(sdfsdfdsf)

Upvotes: 2

Related Questions