Reputation: 80505
I have a column with HTML content. I want to search for words in that column, but only the text, not the HTML code.
For example:
(1) <p class="last">First time I went there...</p>
(2) This is a <em>very</em> subtle colour.
(1) Searching for last
doesn't find it, because it's a class name, not content.
(2) Searching for very subtle
will find it, ignoring HTML
Is this possible with SQLite directly?
Note: I cannot define functions.
Upvotes: 2
Views: 1445
Reputation: 1326
Don't do it with SQLite.
Do it with your programming language, your framework that is using SQLite.
In the table, where you have the column with the html code, add additional columns for data about the html. You will have to gather the data for the extra columns, while you analyze the html with your framework.
Track data about the structure the html format does have and save in an extra column the textual content of the html data.
You can get all tags by simple REGEX:
/<?[^<>]+>?/
Checkout how you receive data by scanning the html data for tags with the regexp above and write an iterated evaluation for tag-content (i.e. if a string in the results-array starts with a "<" it´s a tag, by scanning it with /<\s*\/\s*[^>]+>/
you will see if it is a ending tag and by scanning it with /<\s*[^\/>]+\s*\/\s*>/
you will see if it is a single closed tag. If none of the differentiated states does apply, it is textual content.
Upvotes: 1
Reputation: 160073
There isn't a good way to do that in SQLite directly (you'd need to build a SQLite extension that would parse the HTML and let you search through it like MSSQL's XML field type).
Your best bet is going to be to parse the HTML in your code and write out all the text into a separate column to search on as @Kevin suggests in the comments.
E.g.
ID | HTML | Text
---------------------------------------------------------------------------
1 | <p class="last">First time ...</p> | First time ...
2 | This is a <em>very</em> subtle colour. | This is a very subtle colour.
Upvotes: 0