Francisc
Francisc

Reputation: 80505

SQLite like with Regular Expressions

I have a column with HTML content. I want to search for words in that column, but only the text, not the HTML code.

For example:

(1) <p class="last">First time I went there...</p>
(2) This is a <em>very</em> subtle colour.

(1) Searching for last doesn't find it, because it's a class name, not content.
(2) Searching for very subtle will find it, ignoring HTML

Is this possible with SQLite directly?

Note: I cannot define functions.

Upvotes: 2

Views: 1445

Answers (2)

JAkk
JAkk

Reputation: 1326

Don't do it with SQLite.

Do it with your programming language, your framework that is using SQLite.

In the table, where you have the column with the html code, add additional columns for data about the html. You will have to gather the data for the extra columns, while you analyze the html with your framework.

Track data about the structure the html format does have and save in an extra column the textual content of the html data.

You can get all tags by simple REGEX:

/<?[^<>]+>?/

Checkout how you receive data by scanning the html data for tags with the regexp above and write an iterated evaluation for tag-content (i.e. if a string in the results-array starts with a "<" it´s a tag, by scanning it with /<\s*\/\s*[^>]+>/ you will see if it is a ending tag and by scanning it with /<\s*[^\/>]+\s*\/\s*>/ you will see if it is a single closed tag. If none of the differentiated states does apply, it is textual content.

Upvotes: 1

Sean Vieira
Sean Vieira

Reputation: 160073

There isn't a good way to do that in SQLite directly (you'd need to build a SQLite extension that would parse the HTML and let you search through it like MSSQL's XML field type).

Your best bet is going to be to parse the HTML in your code and write out all the text into a separate column to search on as @Kevin suggests in the comments.

E.g.

 ID | HTML                                   | Text
 ---------------------------------------------------------------------------
 1  | <p class="last">First time ...</p>     | First time ...
 2  | This is a <em>very</em> subtle colour. | This is a very subtle colour.

Upvotes: 0

Related Questions