Reputation: 3807
I migrated a News database into a CakePHP news site I am creating. I have a problem with displaying the text from those migrated news because in the text that was imported to DB there were HTML tags that controls the text within them.
Could anyone help me find a way to remove these texts without compromising the layout of those same news?
Basically, I would like to accomplish the following:
ArticlesController
function fixtext(){...}
http://mydomain.com/articles/fixtext
, all the affected rows in the Article.body
column would be scanned and fixed.The section of text I want to remove is font-size: 12pt; line-height: 115%;
, which in within the <span>...</span>
tag.
I had something in mind like this, but I am not sure how to implement it
function fixtext(){
$this->autoRender = 'FALSE';
$articles = $this->Article->find(
'all',
array(
'fields' => array(
'Article.body',
'Article.id'
),
'recursive' => -1
)
);
foreach($articles as $article){
// Per Dunhamzzz suggestion
$text = str_replace('font-size: 12pt; line-height: 115%;', '', $article['Article']['body']);
$this->Article->id = $article['Article']['id'];
$this->Article->saveField('Article.body', $text);
}
$this->redirect('/');
}
I am not sure how to approach this, and what is the best way.
Upvotes: 0
Views: 235
Reputation: 11232
Firstly, I would personally create a shell to accomplish this as it is a batch job and (depending on the amount of records involved) you may hit Apache's request timeout limit. Also, it's a good (fun) learning experience and the shell can be extended to perform future maintenance tasks.
Secondly, it is a bad idea to parse HTML using (greedy) regular expressions due to the fact it may be malformed. It is safer to use an HTML parser or using simple string replacements instead but, if it is a small regular string that can be pattern matched safely (ie. your not trying to remove the closing </span>
tags), regular expressions can work.
Something like this (untested):
// app/vendors/shells/article.php
<?php
/**
* Maintenance tasks for Articles
*/
class Article extends Shell {
/**
* Clean HTML in articles.
*/
public function cleanHtml(){
// safety kill switch (comment before running)
$this->quit('Backup the `articles` table before running this!');
// this query will time out if you have millions of records
$articles = $this->Article->find('all', array(
'fields' => array(
'Article.name',
'Article.body',
'Article.id'
),
'recursive' => -1,
));
// loop and do stuff
foreach ($articles as $article) {
$this->out('Processing ' . $article['Article']['name'] . ' ... ');
$article['Article']['body'] = $this->_removeInlineStyles($article['Article']['body']);
$this->Article->id = $article['Article']['id'];
$saved = $this->Article->saveField('body', $article['Article']['body']);
$status = ($saved) ? 'done' : 'fail';
$this->out($status);
}
}
/**
* Removes inline CSS styles added by naughty WYSIWYG editors (or pasting from Word!)
*/
protected function _removeInlineStyles($html) {
$html = preg_replace('/ style="[^"']+"/gi', '', $html);
return $html;
}
}
Upvotes: 2
Reputation: 14808
You can use a simple str_replace()
to cut that piece of text out.
foreach($articles as $article){
$this->Article->saveField(
'Article.body' => str_replace('font-size: 12pt; line-height: 115%;', '', $article['Article']['body']),
'Article.id' => $article['Article']['id']
);
}
This is pending the text is the same in each case, otherwise you will need something a bit more complicated with regular expressions (or maybe multiple str_replace()
calls to remove each bad property).
Upvotes: 1