Sumeet Gavhale
Sumeet Gavhale

Reputation: 800

Simultaneous Preg_replace operation in php and regex

I know many of the users have asked this type of question but I am stuck in an odd situation.

I am trying a logic where multiple occurance of a specific pattern having unique identifier will be replaced with some conditional database content if there match is found.

My regex pattern is

'/{code#(\d+)}/'

where the 'd+' will be my unique identifier of the above mentioned pattern.

My Php code is:

<?php

 $text="The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}"; 
 $newsld=preg_match_all('/{code#(\d+)}/',$text,$arr);
 $data = array("first Replace","Second Replace", "Third Replace");
 echo $data=str_replace($arr[0], $data, $text);

?>

This works but it is not at all dynamic, the numbers after #tag from pattern are ids i.e 1,2 & 3 and their respective data is stored in database.

how could I access the content from DB of respective ID mentioned in the pattern and would replace the entire pattern with respective content.

I am really not getting a way of it. Thank you in advance

Upvotes: 1

Views: 431

Answers (2)

hakre
hakre

Reputation: 197767

Here is another variant on how to solve the problem: As access to the database is most expensive, I would choose a design that allows you to query the database once for all codes used.

The text you've got could be represented with various segments, that is any combination of <TEXT> and <CODE> tokens:

The old version is {code#1}, The new version is {code#2}, ...
<TEXT_____________><CODE__><TEXT_______________><CODE__><TEXT_ ...

Tokenizing your string buffer into such a sequence allows you to obtain the codes used in the document and index which segments a code relates to.

You can then fetch the replacements for each code and then replace all segments of that code with the replacement.

Let's set this up and defined the input text, your pattern and the token-types:

$input = <<<BUFFER
The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}
BUFFER;

$regex = '/{code#(\d+)}/';

const TOKEN_TEXT = 1;
const TOKEN_CODE = 2;

Next is the part to put the input apart into the tokens, I use two arrays for that. One is to store the type of the token ($tokens; text or code) and the other array contains the string data ($segments). The input is copied into a buffer and the buffer is consumed until it is empty:

$tokens   = [];
$segments = [];

$buffer = $input;

while (preg_match($regex, $buffer, $matches, PREG_OFFSET_CAPTURE, 0)) {
    if ($matches[0][1]) {
        $tokens[]   = TOKEN_TEXT;
        $segments[] = substr($buffer, 0, $matches[0][1]);
    }

    $tokens[]   = TOKEN_CODE;
    $segments[] = $matches[0][0];

    $buffer = substr($buffer, $matches[0][1] + strlen($matches[0][0]));
}

if (strlen($buffer)) {
    $tokens[]   = TOKEN_TEXT;
    $segments[] = $buffer;
    $buffer = "";
}

Now all the input has been processed and is turned into tokens and segments.

Now this "token-stream" can be used to obtain all codes used. Additionally all code-tokens are indexed so that with the number of the code it's possible to say which segments need to be replaced. The indexing is done in the $patterns array:

$patterns = [];

foreach ($tokens as $index => $token) {
    if ($token !== TOKEN_CODE) {
        continue;
    }
    preg_match($regex, $segments[$index], $matches);
    $code              = (int)$matches[1];
    $patterns[$code][] = $index;
}

Now as all codes have been obtained from the string, a database query could be formulated to obtain the replacement values. I mock that functionality by creating a result array of rows. That should do it for the example. Technically you'll fire a a SELECT ... FROM ... WHERE code IN (12, 44, ...) query that allows to fetch all results at once. I fake this by calculating a result:

$result = [];
foreach (array_keys($patterns) as $code) {
    $result[] = [
        'id'   => $code,
        'text' => sprintf('v%d.%d.%d%s', $code * 2 % 5 + $code % 2, 7 - 2 * $code % 5, 13 + $code, $code === 3 ? '' : '-beta'),
    ];
}

Then it's only left to process the database result and replace those segments the result has codes for:

foreach ($result as $row) {
    foreach ($patterns[$row['id']] as $index) {
        $segments[$index] = $row['text'];
    }
}

And then do the output:

echo implode("", $segments);

And that's it then. The output for this example:

The old version is v3.5.14-beta, The new version is v4.3.15-beta, The stable version is v2.6.16

The whole example in full:

<?php
/**
 * Simultaneous Preg_replace operation in php and regex
 *
 * @link http://stackoverflow.com/a/29474371/367456
 */

$input = <<<BUFFER
The old version is {code#1}, The new version is {code#2}, The stable version is {code#3}
BUFFER;

$regex = '/{code#(\d+)}/';

const TOKEN_TEXT = 1;
const TOKEN_CODE = 2;

// convert the input into a stream of tokens - normal text or fields for replacement

$tokens   = [];
$segments = [];

$buffer = $input;

while (preg_match($regex, $buffer, $matches, PREG_OFFSET_CAPTURE, 0)) {
    if ($matches[0][1]) {
        $tokens[]   = TOKEN_TEXT;
        $segments[] = substr($buffer, 0, $matches[0][1]);
    }

    $tokens[]   = TOKEN_CODE;
    $segments[] = $matches[0][0];

    $buffer = substr($buffer, $matches[0][1] + strlen($matches[0][0]));
}

if (strlen($buffer)) {
    $tokens[]   = TOKEN_TEXT;
    $segments[] = $buffer;
    $buffer = "";
}

// index which tokens represent which codes

$patterns = [];

foreach ($tokens as $index => $token) {
    if ($token !== TOKEN_CODE) {
        continue;
    }
    preg_match($regex, $segments[$index], $matches);
    $code              = (int)$matches[1];
    $patterns[$code][] = $index;
}

// lookup all codes in a database at once (simulated)
// SELECT id, text FROM replacements_table WHERE id IN (array_keys($patterns))
$result = [];
foreach (array_keys($patterns) as $code) {
    $result[] = [
        'id'   => $code,
        'text' => sprintf('v%d.%d.%d%s', $code * 2 % 5 + $code % 2, 7 - 2 * $code % 5, 13 + $code, $code === 3 ? '' : '-beta'),
    ];
}

// process the database result

foreach ($result as $row) {
    foreach ($patterns[$row['id']] as $index) {
        $segments[$index] = $row['text'];
    }
}

// output the replacement result

echo implode("", $segments);

Upvotes: 2

HamZa
HamZa

Reputation: 14921

It's not that difficult if you think about it. I'll be using PDO with prepared statements. So let's set it up:

$db = new PDO(                                      // New PDO object
    'mysql:host=localhost;dbname=projectn;charset=utf8', // Important: utf8 all the way through
    'username',
    'password',
    array(
        PDO::ATTR_EMULATE_PREPARES => false,        // Turn off prepare emulation
        PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
    )
);

This is the most basic setup for our DB. Check out this thread to learn more about emulated prepared statements and this external link to get started with PDO.

We got our input from somewhere, for the sake of simplicity we'll define it:

$text = 'The old version is {code#1}, The new version is {code#2}, The stable version {code#3}';

Now there are several ways to achieve our goal. I'll show you two:

1. Using preg_replace_callback():

$output = preg_replace_callback('/{code#(\d+)}/', function($m) use($db) {
    $stmt = $db->prepare('SELECT `content` FROM `footable` WHERE `id`=?');
    $stmt->execute(array($m[1]));
    $row = $stmt->fetch(PDO::FETCH_ASSOC);
    if($row === false){
        return $m[0]; // Default value is the code we captured if there's no match in de DB
    }else{
        return $row['content'];
    }
}, $text);

echo $output;

Note how we use use() to get $db inside the scope of the anonymous function. global is evil

Now the downside is that this code is going to query the database for every single code it encounters to replace it. The advantage would be setting a default value in case there's no match in the database. If you don't have that many codes to replace, I would go for this solution.

2. Using preg_match_all():

if(preg_match_all('/{code#(\d+)}/', $text, $m)){
    $codes = $m[1];     // For sanity/tracking purposes
    $inQuery = implode(',', array_fill(0, count($codes), '?')); // Nice common trick: https://stackoverflow.com/a/10722827
    $stmt = $db->prepare('SELECT `content` FROM `footable` WHERE `id` IN(' . $inQuery . ')');
    $stmt->execute($codes);
    $rows = $stmt->fetchAll(PDO::FETCH_ASSOC);

    $contents = array_map(function($v){
        return $v['content'];
    }, $rows); // Get the content in a nice (numbered) array

    $patterns = array_fill(0, count($codes), '/{code#(\d+)}/'); // Create an array of the same pattern N times (N = the amount of codes we have)
    $text = preg_replace($patterns, $contents, $text, 1); // Do not forget to limit a replace to 1 (for each code)
    echo $text;
}else{
    echo 'no match';
}

The problem with the code above is that it replaces the code with an empty value if there's no match in the database. This could also shift up the values and thus could result in a shifted replacement. Example (code#2 doesn't exist in db):

Input: foo {code#1}, bar {code#2}, baz {code#3}
Output: foo AAA, bar CCC, baz
Expected output: foo AAA, bar , baz CCC

The preg_replace_callback() works as expected. Maybe you could think of a hybrid solution. I'll let that as a homework for you :)

Upvotes: 3

Related Questions