AKKAweb
AKKAweb

Reputation: 3807

CakePHP 1.3: Way to removed a specific string from a text - PHP Function or Regular Expression

I migrated a News database into a CakePHP news site I am creating. I have a problem with displaying the text from those migrated news because in the text that was imported to DB there were HTML tags that controls the text within them.

Could anyone help me find a way to remove these texts without compromising the layout of those same news?

Basically, I would like to accomplish the following:

  1. Create a ONE-Time Use only function that I can include in my ArticlesController
  2. For example the function name would be function fixtext(){...}
  3. When I call this function from lets say http://mydomain.com/articles/fixtext, all the affected rows in the Article.body column would be scanned and fixed.

The section of text I want to remove is font-size: 12pt; line-height: 115%;, which in within the <span>...</span> tag.

I had something in mind like this, but I am not sure how to implement it

function fixtext(){
        $this->autoRender = 'FALSE';

        $articles = $this->Article->find(
            'all',
            array(
                'fields' => array(
                        'Article.body',
                        'Article.id'
                ),
                'recursive' => -1
            )
        );

        foreach($articles as $article){
              // Per Dunhamzzz suggestion
              $text = str_replace('font-size: 12pt; line-height: 115%;', '', $article['Article']['body']);
              $this->Article->id =  $article['Article']['id'];
              $this->Article->saveField('Article.body', $text);
        }

        $this->redirect('/');
}

I am not sure how to approach this, and what is the best way.

Upvotes: 0

Views: 235

Answers (2)

deizel.
deizel.

Reputation: 11232

Firstly, I would personally create a shell to accomplish this as it is a batch job and (depending on the amount of records involved) you may hit Apache's request timeout limit. Also, it's a good (fun) learning experience and the shell can be extended to perform future maintenance tasks.

Secondly, it is a bad idea to parse HTML using (greedy) regular expressions due to the fact it may be malformed. It is safer to use an HTML parser or using simple string replacements instead but, if it is a small regular string that can be pattern matched safely (ie. your not trying to remove the closing </span> tags), regular expressions can work.

Something like this (untested):

// app/vendors/shells/article.php
<?php
/**
 * Maintenance tasks for Articles
 */
class Article extends Shell {
/**
 * Clean HTML in articles.
 */
    public function cleanHtml(){
        // safety kill switch (comment before running)
        $this->quit('Backup the `articles` table before running this!');
        // this query will time out if you have millions of records
        $articles = $this->Article->find('all', array(
            'fields' => array(
                'Article.name',
                'Article.body',
                'Article.id'
            ),
            'recursive' => -1,
        ));
        // loop and do stuff
        foreach ($articles as $article) {
            $this->out('Processing ' . $article['Article']['name'] . ' ... ');
            $article['Article']['body'] = $this->_removeInlineStyles($article['Article']['body']);
            $this->Article->id = $article['Article']['id'];
            $saved = $this->Article->saveField('body', $article['Article']['body']);
            $status = ($saved) ? 'done' : 'fail';
            $this->out($status);
        }
    }
/**
 * Removes inline CSS styles added by naughty WYSIWYG editors (or pasting from Word!)
 */
    protected function _removeInlineStyles($html) {
        $html = preg_replace('/ style="[^"']+"/gi', '', $html);
        return $html;
    }
}

Upvotes: 2

Dunhamzzz
Dunhamzzz

Reputation: 14808

You can use a simple str_replace() to cut that piece of text out.

foreach($articles as $article){
    $this->Article->saveField(
        'Article.body' => str_replace('font-size: 12pt; line-height: 115%;', '', $article['Article']['body']),
        'Article.id' => $article['Article']['id']
    );
}

This is pending the text is the same in each case, otherwise you will need something a bit more complicated with regular expressions (or maybe multiple str_replace() calls to remove each bad property).

Upvotes: 1

Related Questions