fischer
fischer

Reputation: 269

regex pattern extract quotes

Switching the code of the debate forum on my website, I am going to change the way quotes are stored in the database. Now I need to come up with a regex to rearrange already submitted posts in my database.

Following is an example of how my current debate post are stored in the database (with quotes in quotes).. Note: I have indented it for the sake of illustration:

Just citing a post
[quote]Text of quote #3
       [quote]Text of quote #2
              [quote]Text of quote #1
                     [name]User 1[/name]
              [/quote]
              [name]User 2[/name]
       [/quote]
       [name]User 3[/name]
[/quote]

What I would like now, is that the former will be rearranged to look like this:

Just citing a post
[quote:User 3]
      Text of quote #3
      [quote:User 2]
           Text of quote #2
           [quote:User 1]
                 Text of quote #1
           [/quote]  
      [/quote]   
[/quote]

Can any of you point me in the direction of how this can be done with regex? I am using PHP.

Thanks in advance, I appreciate all your help :)

Fischer

Upvotes: 3

Views: 329

Answers (4)

Justin Morgan
Justin Morgan

Reputation: 30715

Don't use a regex for this. What you're talking about is essentially a mutation of XML, and regex is not the right tool for parsing XML. What you need to do is write a parser.

However, what I would suggest is using actual XML instead. It already exists, it's standardized, the syntax is almost exactly the same, and there are already a ton of parsers for it. I'd start here:

Edit: Just to clarify how easily this could become valid XML:

<quote src="User 3">
      Text of quote #3
      <quote src="User 2">
           Text of quote #2
           <quote src="User 1">
                 Text of quote #1
           </quote>  
      </quote>   
</quote>

Upvotes: 0

joelhardi
joelhardi

Reputation: 11169

This function will do the job. It recursively reformats from the inner-most quotation to the outer-most:

function reformat($str) {
  while (preg_match('#\[quote\](.+)\[name\](.+)\[/name\]\s*\[/quote\]#Us',
         $str, 
         $matches)) {
    $str = str_replace($matches[0], 
                       '[quote:'.$matches[2].']'.$matches[1].'[/quote]',
                       $str);
  }
  return $str; 
}

In action:

$before = "Just citing a post
  [quote]Text of quote #3
    [quote]Text of quote #2
      [quote]Text of quote #1
        [name]User 1[/name]
      [/quote]
      [name]User 2[/name]
    [/quote]
    [name]User 3[/name]
  [/quote]";

echo reformat($before);

Outputs:

Just citing a post
  [quote:User 3]Text of quote #3
    [quote:User 2]Text of quote #2
      [quote:User 1]Text of quote #1
        [/quote]
      [/quote]
    [/quote]

Upvotes: 1

krcko
krcko

Reputation: 2834

This will do it:

$input = "Just citing a post
[quote]Text of quote #3
       [quote]Text of quote #2
              [quote]Text of quote #1
                     [name]User 1[/name]
              [/quote]
              [name]User 2[/name]
       [/quote]
       [name]User 3[/name]
[/quote]";

function fix_quotes($string) {
    $regexp = '`(\s*)\[quote\]((?:[^\[]|\[(?!quote\]))*?)\[name\](.*?)\[\/name\]\s*\[\/quote\]`';
    while (preg_match($regexp, $string)) {
        $string = preg_replace_callback($regexp, function($match) {
            return $match[1] . '[quote:' . $match[3] . ']' . trim(fix_quotes($match[2])) . $match[1] . '[/quote]';
        }, $string);
    }
    return $string;
}

echo fix_quotes($input);

Results in:

Just citing a post
[quote:User 3]Text of quote #3
       [quote:User 2]Text of quote #2
              [quote:User 1]Text of quote #1
              [/quote]
       [/quote]
[/quote]

Edit: haven't seen that joelhardi already posted similar solution, and his looks a bit cleaner so I'd stick with his solution :)

Upvotes: 1

qJake
qJake

Reputation: 17129

Because of the complexity involved here (you're going to need conditionals, as well as "Match/Replace All" functionality), I would recommend not doing this in just Regex. Use a programming language with tight Regex functionality, and combine Regex with this language to do what you want. I recommend PHP.

Upvotes: 0

Related Questions