Googlebot
Googlebot

Reputation: 15673

Splitting a html text into characters and html tags (with PHP & MySQL)

I want to store a html text into database as splitted to individual characters. Since, the text is long and the process is frequence, performance is of particular importance. Thus, I need to find an efficient way to conudct this in PHP without overload of building multiple arrays.

Of course, the purpose is simple text with a few markup html tags, without nested nodes. It can be considered for BBCode or something like that. I just want to have this possibility to skip some tags in this split process.

Example:

$html='This <i>is</i> a <strong>test</test>';

This string should be stored in mysql database as

id  character  html_tag
1    T
2    h
3    i
4    s
5
6    i          italic
7    s          italic
8
9    a
10
11   t          strong
12   e          strong
13   s          strong
14   t          strong
15   !

How to capture the individual characters without corresponding html tags?

Upvotes: 1

Views: 306

Answers (1)

maciej-ka
maciej-ka

Reputation: 575

Parse Html with fast XMLReader.

This code will also work with nested tags, $tags variable is stack of tags. Here I always echo the most nested tag, the last one in stack.

$html='This <i>is</i> a <strong>test</strong>!';

$reader=new XMLReader();
$reader->XML('<root>'.$html.'</root>');
// skip root node
$reader->read();
$tags=array('');
while($reader->read())
    switch($reader->nodeType)
    {
        case $reader::ELEMENT:
            $tags[]=$reader->name;
            break;
        case $reader::END_ELEMENT;
            array_pop($tags);
            break;
        default:
            for($i=0;$i<strlen($reader->value);$i++)
                // your insert sql here
                echo "<br/>'".$reader->value[$i]."' ".end($tags);
    }

Also, because speed is crucial, consider buffering inserts into string and running them as a batch:

INSERT INTO tname (character,html_tag) VALUES('T',''),('h','');

Upvotes: 2

Related Questions