Angelo
Angelo

Reputation: 360

Bash: need to find text within matching braces (parantheses) in text

I have some text that looks like this:

(something1)something2

However something1 and something2 might also have some parentheses inside them such as

(some(thing)1)something(2)

I want to extract something1 (including internal parentheses if there are any) to a variable. Since I can count on the text always starting with an opening parentheses, I'm hoping that I can do something where I match the first parenthesis to the correct closing parentheses, and extract the middle.

Everything I have tried so far has the potential to match the wrong ending parentheses.

Upvotes: 3

Views: 3735

Answers (4)

clt60
clt60

Reputation: 63912

If you have perl, the:

perl -MText::Balanced -nlE 'say [Text::Balanced::extract_bracketed( $_, "()" )]->[0]' <<EOF
(something1)something2
(some(thing)1)something(2)
(some(t()()hing)()1)()something(2)
EOF

will prints

(something1)
(some(thing)1)
(some(t()()hing)()1)

Upvotes: 4

Lorkenpeist
Lorkenpeist

Reputation: 1495

awk can do it:

#!/bin/awk -f
{
   for (i=1; i<=length; ++i) {
      if (numLeft == 0 && substr($0, i, 1) == "(") {
         leftPos = i
         numLeft = 1
      } else if (substr($0, i, 1) == "(") {
         ++numLeft
      } else if (substr($0, i, 1) == ")") {
         ++numRight
      }
      if (numLeft && numLeft == numRight) {
         print substr($0, leftPos, i-leftPos+1)
         next
      }
   }
}

Input:

(something1)something2
(some(thing)1)something(2)

Output:

(something1)
(some(thing)1)

Upvotes: 1

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can do it with perl:

echo "(some(thing)1)something(2)" | perl -ne '$_ =~ /(\((?:\(.*\)|[^(])*\))|\w+/s; print $1;'

Upvotes: 2

Angelo
Angelo

Reputation: 360

Since this is apparently something that is impossible with regular expressions, I have resorted to pickup the the characters 1 by 1:

    first=""
count=0
while test -n "$string"
do
    char=${string:0:1}  # Get the first character
    if [[ "$char" == ")" ]]
    then
        count=$(( $count - 1 ))
    fi
    if [[ $count > 0 ]]
    then
        first="$first$char"
    fi
    if [[ "$char" == "(" ]]
    then
        count=$(( $count + 1 ))
    fi
    string=${string:1}  # Trim the first character
    if [[ $count == 0 ]]
    then
        second="$string"
        string=""
    fi
done

Upvotes: 2

Related Questions