Reputation: 143
We are using an oracle text query to perform a search of a table based on user input.
So if the user types in "blue sky" - we want to search for each word on an OR basis, so we do - where contains(columnname,'blue or sky',1)>0
so we take the user input and replace spaces with the word 'or'
before passing to the query.
This works fine and we are able to order descending on the score to give us the most relevant entries first.
However we have then had problem with 'special' characters
- it started with a comma but then I found in the documentation there are loads of them.
So we wrote some code which detected each 'special' character
and prefixed it with an escape '\' character
. This also works ok. But then there are also the reserved words, eg AND - so if a user types in 'jack and jill'
- we convert to 'jack or and or jill'
and this gives a text query parser syntax error because of the use of the word 'and' - so then trying to cater for those too but having to try to determine where they are prefixed and suffixed with spaces so as not to pick out 'handy' for example. Except of course it could be the first or last word.....grrr there must be an easier way to do this.....
Then I read up on the {} braces option
so this escapes the whole string.
Question - can I just do this, even if there are no special characters in the string?
Also I can't see how this caters for the OR functionality we need for each word - so if I do contains(columnname,'{jack or xxxxx}',1) > 0
) it does not return anything.
Any advice will be greatfully received, thanks!
Upvotes: 2
Views: 4317
Reputation: 24581
Maybe I do not answer exactly your question, but why don't you use this way to search?
with input as (select 'blue red white' example from dual),
split_rule as (select '[^ ]+' pattern from dual),
input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
from input, split_rule
connect by level <= regexp_count(example,pattern)),
search_table as (select 'blue sky' item from dual
union all
select 'green grass' from dual
union all
select 'red apple' from dual
union all
select 'orange juice' from dual)
select item string_found,
word hit_by
from input_array,search_table
where item like '%'||word||'%';
This should be the same in performance; "materialize" hint to disallow Oracle to export connect by
to outside.
Still if you want to parse the string to words outside of the query - simply create an Oracle temporary table, fill it on every request with user search words (imitate "input_array" from the query above) and use it.
EDIT 1: As far as you provided us with some additional information, I update the answer. The top part remains the same, simply change the query:
1) If your rating is based on different words only use this query:
with input as (select 'blue red white' example from dual),
split_rule as (select '[^ ]+' pattern from dual),
input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
from input, split_rule
connect by level <= regexp_count(example,pattern)),
search_table as (select 'blue sky red' item from dual
union all
select 'green grass' from dual
union all
select 'red apple blue white' from dual
union all
select 'orange juice' from dual)
select item string_found, count(*) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;
2) If your rating is based on overall amount of hits:
with input as (select 'blue red white' example from dual),
split_rule as (select '[^ ]+' pattern from dual),
input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
from input, split_rule
connect by level <= regexp_count(example,pattern)),
search_table as (select 'blue sky red blue blue' item from dual
union all
select 'green grass' from dual
union all
select 'red apple blue white' from dual
union all
select 'orange juice' from dual)
select item string_found, sum(regexp_count(item,word)) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;
EDIT 2: To use regexp_count in Oracle 10g replace it with query like
select length(no_double_spaces) - length(replace(no_double_spaces,' ')) + 1 amount_of_words
from (select trim(regexp_replace('blue red white','[ ]+',' ')) no_double_spaces
from dual);
And now I show how to use it with a first query of Edit 1:
with input as (select 'blue red white' example from dual),
split_rule as (select '[^ ]+' pattern from dual),
input_array as (select /* + materialize */ regexp_substr(example,pattern,1,level) word
from input, split_rule
connect by level <= (select length(no_double_spaces) - length(replace(no_double_spaces,' ')) + 1 amount_of_words
from (select trim(regexp_replace(example,'[ ]+',' ')) no_double_spaces
from input)
)
),
search_table as (select 'blue sky red blue blue' item from dual
union all
select 'green grass' from dual
union all
select 'red apple blue white' from dual
union all
select 'orange juice' from dual)
select item string_found, count(*) rate
from input_array,search_table
where item like '%'||word||'%'
group by item
order by 2 desc;
Upvotes: 1