Reputation: 360
I have some text that looks like this:
(something1)something2
However something1 and something2 might also have some parentheses inside them such as
(some(thing)1)something(2)
I want to extract something1
(including internal parentheses if there are any) to a variable. Since I can count on the text always starting with an opening parentheses, I'm hoping that I can do something where I match the first parenthesis to the correct closing parentheses, and extract the middle.
Everything I have tried so far has the potential to match the wrong ending parentheses.
Upvotes: 3
Views: 3735
Reputation: 63912
If you have perl, the:
perl -MText::Balanced -nlE 'say [Text::Balanced::extract_bracketed( $_, "()" )]->[0]' <<EOF
(something1)something2
(some(thing)1)something(2)
(some(t()()hing)()1)()something(2)
EOF
will prints
(something1)
(some(thing)1)
(some(t()()hing)()1)
Upvotes: 4
Reputation: 1495
awk
can do it:
#!/bin/awk -f
{
for (i=1; i<=length; ++i) {
if (numLeft == 0 && substr($0, i, 1) == "(") {
leftPos = i
numLeft = 1
} else if (substr($0, i, 1) == "(") {
++numLeft
} else if (substr($0, i, 1) == ")") {
++numRight
}
if (numLeft && numLeft == numRight) {
print substr($0, leftPos, i-leftPos+1)
next
}
}
}
Input:
(something1)something2
(some(thing)1)something(2)
Output:
(something1)
(some(thing)1)
Upvotes: 1
Reputation: 89557
You can do it with perl:
echo "(some(thing)1)something(2)" | perl -ne '$_ =~ /(\((?:\(.*\)|[^(])*\))|\w+/s; print $1;'
Upvotes: 2
Reputation: 360
Since this is apparently something that is impossible with regular expressions, I have resorted to pickup the the characters 1 by 1:
first=""
count=0
while test -n "$string"
do
char=${string:0:1} # Get the first character
if [[ "$char" == ")" ]]
then
count=$(( $count - 1 ))
fi
if [[ $count > 0 ]]
then
first="$first$char"
fi
if [[ "$char" == "(" ]]
then
count=$(( $count + 1 ))
fi
string=${string:1} # Trim the first character
if [[ $count == 0 ]]
then
second="$string"
string=""
fi
done
Upvotes: 2