TeAmEr
TeAmEr

Reputation: 4773

regular expression to get text between brackets that have text between brackets

After trying 10 times to rewrite this question to be accepted , i have a small text that have text between brackets, i want to extract that text so i wrote this expression :

/(\([^\)]+\))/i

but this only extracts text between first ( and last ) ignoring the rest of text so is there any way to extract full text like :

i want(to) extract this text

from :

this is the text that (i want(to) extract this text) from

there might be more than one bracket enclosed sub-text .

Thanks

EDIT Found this :

preg_match_all("/\((([^()]*|(?R))*)\)/", $rejoin, $matches);

very usefull from the link provided in the accepted answer

Upvotes: 3

Views: 2298

Answers (4)

Steve Kinzey
Steve Kinzey

Reputation: 371

I think I understand the question and that is that you would like to extract "i want(to) extract this text" or something similar from something that might appear like this: this is the text that (i want(to) extract this text) from

If so, you might find success with the following regular expression (using $text to define the variable being examined and $txt as the variable being created in the case of a match which is then stored in the array $t[]):

if (preg_match('/\(\w+.+\)/', $text, $t)) {
$txt = $t[0];
} else {
$txt = "";
}
echo $desired=substr($txt,1,-1);

The RegEx at the root of this is: (\w+.+) and here is the explanation of the code:

  1. Match the character “(” literally «(»
  2. Match a single character that is a “word character” (letters, digits, and underscores) «\w+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
  3. Match any single character that is not a line break character «.+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
  4. Match the character “)” literally «)»
  5. Put the text that is within the parentheses into a new variable $desired. Display the $desired characters by selecting a substring that is reduced by one character on either end, thereby eliminating the bounding parentheses.«echo $desired=substr($txt,1-1)»

Using the above I was able to display: i want(to) extract this text from the variable $text = this is the text that (i want(to) extract this text) from. If desire to pull the "to" from the (to) I would suggest that you run the variable through the regex loop until there are no more ( )'s found in the expression and it returns a null value and concatenate the returned values to form the variable of interest.

Best of luck, Steve

Upvotes: 0

anubhava
anubhava

Reputation: 784908

You will need recursive subpatterns to solve this. Here is the regex that should work for you:

$str = 'this is the text that (i want(to) extract this text) from';
if (preg_match('/\s* \( ( (?: [^()]* | (?0) )+ ) \) /x', $str, $arr))
   var_dump($arr);

OUTPUT:

string(28) "i want(to) extract this text"

Upvotes: 2

Anirudha
Anirudha

Reputation: 32787

Yes you can use this pattern

   v                   v
 (\([^\)\(]*)+([^\)\(]*\))+
 ------------ -------------
      |            |
      |            |->match all (right)brackets to the right..
      |
      |->match all (left)brackets to the left

Demo


Above pattern won't work if you have a recursive pattern like this

(i want(to) (extract and also (this)) this text)
                              ------
            -------------------------

In this case you can use the recursive pattern as recommended by elclanrs


You can also do it without without using regex by maintaining a count of number of ( and )

So, assume noOfLB is the count of ( and noOfRB is the count of )

  • keep on iterating each character in string and maintain the position of first (
  • increament noOfLB if you find (
  • increment noOfRB if you find )
  • if noOfLB==noOfRB,you have found the last position of last )

I don't know php so I would implement above algo in c#

public static string getFirstRecursivePattern(string input)
{
    int firstB=input.IndexOf("("),noOfLB=0,noOfRB=0;
    for(int i=firstB;i<input.Length && i>=0;i++)
    {
         if(input[i]=='(')noOfLB++;
         if(input[i]==')')noOfRB++;
         if(noOfLB==noOfRB)return input.Substring(firstB,i-firstB+1);
    }
    return "";
}

Upvotes: 6

go-oleg
go-oleg

Reputation: 19480

You can also use substrings:

$yourString = "this is the text that (i want(to) extract this text) from";

$stringAfterFirstParen = substr( strstr( $yourString, "(" ), 1 );

$indexOfLastParen = strrpos( $stringAfterFirstParen, ")" );

$stringBetweenParens = substr( $stringAfterFirstParen, 0, $indexOfLastParen );

Upvotes: 0

Related Questions