Reputation: 15673
I want to store a html text into database as splitted to individual characters. Since, the text is long and the process is frequence, performance is of particular importance. Thus, I need to find an efficient way to conudct this in PHP
without overload of building multiple arrays.
Of course, the purpose is simple text with a few markup html tags, without nested nodes. It can be considered for BBCode or something like that. I just want to have this possibility to skip some tags in this split process.
Example:
$html='This <i>is</i> a <strong>test</test>';
This string should be stored in mysql
database as
id character html_tag
1 T
2 h
3 i
4 s
5
6 i italic
7 s italic
8
9 a
10
11 t strong
12 e strong
13 s strong
14 t strong
15 !
How to capture the individual characters without corresponding html tags?
Upvotes: 1
Views: 306
Reputation: 575
Parse Html with fast XMLReader.
This code will also work with nested tags, $tags
variable is stack of tags. Here I always echo the most nested tag, the last one in stack.
$html='This <i>is</i> a <strong>test</strong>!';
$reader=new XMLReader();
$reader->XML('<root>'.$html.'</root>');
// skip root node
$reader->read();
$tags=array('');
while($reader->read())
switch($reader->nodeType)
{
case $reader::ELEMENT:
$tags[]=$reader->name;
break;
case $reader::END_ELEMENT;
array_pop($tags);
break;
default:
for($i=0;$i<strlen($reader->value);$i++)
// your insert sql here
echo "<br/>'".$reader->value[$i]."' ".end($tags);
}
Also, because speed is crucial, consider buffering inserts into string and running them as a batch:
INSERT INTO tname (character,html_tag) VALUES('T',''),('h','');
Upvotes: 2