Paddy Hallihan
Paddy Hallihan

Reputation: 1686

PHP / SQL - Improving search feature / fuzzy search

I am trying to create a product search for my site, where a user can search for products in multiple languages and (hopefully) get fuzzy search results if there is no exact match.

So first I do a basic 'LIKE' SQL query to see if there are matches, if there are no results from this I query all the products and create an array sorted by their similarity using the similar_text() function

For example I search 'shirt' this is fine if the meta for this product just includes the word 'shirt', but if the meta includes 'blue branded tshirt' this is being more descriptive and gives the user a chance to search by brand but means that the search will more than likely go fuzzy rather than be found with a LIKE SQL query.

This is kind of working but I was wondering how this could be improved, is there a better way of searching or how do people normally do it? Should I be splitting the meta into each individual keywords and try to see how many words match rather than matching the term to the whole meta?

    $ids = [];

    $params = ['%'.$term.'%'];
    $sql = "SELECT * FROM pro_search WHERE $lang LIKE ?";
    $stmt = DB::run($sql,$params);

    $count = $stmt->rowCount();
    if($count > 0){

        // product search
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            array_push($ids,$id);
        }
        show_products($ids);

    }else{

        // product fuzzy search
        $sql = "SELECT * FROM pro_search";
        $stmt = DB::run($sql);
        while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
            $id = $row["pro_id"];
            $result = $row[$lang];
            similar_text($term,$result,$similarity);
            $similar_array[$similarity][] = $id;
        }

        $closest_match = array_keys($similar_array);
        rsort($closest_match);
        $match_count = count($closest_match);

        for($i=0; $i<$match_count; $i++){
            foreach($similar_array[$closest_match[$i]] as $id){
                array_push($ids,$id);
            }
        }
        show_products($ids);
    }

I have asked similar questions before and people have pointed me to different ways of comparing the term against the meta (such as levenshtein), but everything I've seen has been comparing two simple words (like apples and oranges) and this just isn't good enough for a real life application with thousands of products and a user could search for literally anything (as in $term='literally anything';)

Key Questions:

Upvotes: 5

Views: 5737

Answers (3)

user2203703
user2203703

Reputation: 2005

You are looking for Full-Text Searches WITH QUERY EXPANSION

MySQL supports text searching by using the LIKE operator and regular expression. However, when the text column is large and the number of rows in a table is increased, using these methods has some limitations:

  • Performance: MySQL has to scan the whole table to find the exact text based on a pattern in the LIKE statement or pattern in the regular expressions.
  • Flexible search: with the LIKE operator and regular expression searches, it is difficult to have a flexible search query e.g., to find product whose description contains car but not classic.
  • Relevance ranking: there is no way to specify which row in the result set is more relevant to the search terms.

Because of these limitations, MySQL extended a very nice feature so-called full-text search. Technically, MySQL creates an index from the words of the enabled full-text search columns and performs searches on this index. MySQL uses a sophisticated algorithm to determine the rows matched against the search query.

To do that, the columns that will be used for search must be in TEXT type and index of type FULLTEXT, index can be given using ALTER TABLE or CREATE INDEX and if you are using phpMyAdmin to manage your databases, you can do that by going to the Structure of that table, then click on More under Action of that column and choose Fulltext.

After that you can performe a search using MATCH AGAINST syntax. MATCH() takes the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform.

Full-Text Searches WITH QUERY EXPANSION:

In some cases, users want to search for information based on the knowledge that they have. Users use their experience to define keywords to search for information, and typically those keywords are too short.

To help users to find information based on the too-short keywords, MySQL full-text search engine introduces a concept called query expansion.

The query expansion is used to widen the search result of the full-text searches based on automatic relevance feedback (or blind query expansion). Technically, MySQL full-text search engine performs the following steps when the query expansion is used:

  • First, MySQL full-text search engine looks for all rows that match the search query.
  • Second, it checks all rows in the search result and finds the relevant words.
  • Third, it performs a search again based on the relevant words instead of the original keywords provided by the users.

The following example shows you how to search for a product whose product name or meta contains at least one word (shirt tshirt).

SELECT * FROM products WHERE MATCH(product_name,product_meta) AGAINST('shirt tshirt' WITH QUERY EXPANSION)

You can read more info in MYSQL document (the link at the beginning of the answer) and here

Also don't miss How Fine-Tuning MySQL Full-Text Search

Upvotes: 6

Saad Ahmad
Saad Ahmad

Reputation: 403

You can think a bit differently if you are still designing the system. In terms of search just do exact search and do in db as suggested earlier because that is much faster - but "learn from each interaction".

  • User types in some term
  • You do exact search, if found great.
  • if not do fuzzy search for each part of the term entered. Still not found, you do soundex. You are trying to find something! But present a long list to the user to filter through.
  • Eventually user selects one. As soon as they do, you add the term they entered to the product they selected.

So basic idea being that you are learning from each interaction and enriching your search set. Also whenever a term is user and user actually clicks on your item, you keep a count on that term-product association as your confidence in that term-product association improved.

Similarly whenever you are present an option, very easily user should be able to say "Not this" and then guide them through a pre-selected hierarchy of your items and eventually when they select one, you save their search term to that product.

So over a few months if you are getting enough users you will have a rich organic search term dataset for your product category with confidence level on each term.

Upvotes: 3

Atnaize
Atnaize

Reputation: 1816

You can use SOUNDEX in sql

SELECT * FROM users 
           WHERE SOUNDEX(job) 
LIKE CONCAT('%',SUBSTRING(SOUNDEX('Manual worker'),2),'%');

And things like Manual worka will work. You just need to ajust the value (currently 2) to match your need.

I see that you already tried the Levenshtein algorithm but you should have an eye on this adaptation (which is also compatible with UTF-8 strings)

For my case, the soundex was more efficient, it will depend on how your users will interact with your application.


But as said in the comment, third parties like ElasticSearch or Algolia can be much more efficient.

For ma part I never used it because the company do not allow us to use third parties softwares. That's why I tried both Levensthein and Soundex

Upvotes: 2

Related Questions