javascriptjqueryregexdomfull-text-search

Reputation: 1727

Full text based Search on Client Machine using Java Script

I am trying to implement a full text-based search functionality on a client machine.

I've found that Lunr.js satisfies my requirement partially, but it has to be loaded on a Server like Apache, then it works perfectly.

In my case, the client machine will not have any server or DB installed. There will just be a bunch of static HTML files in a directory, and one index file accepting the user's input from a search box, which searches for this string in those static HTML files.

Google search revealed some interesting words which (might) be needed in my project: innerHtml, DOM, iframes and RegExp.

Please formulate my queries to your answers. Thanks in advance

Upvotes: 2

Answers (3)

Oliver Nightingale

Reputation: 1835

It sounds like your html files are static, if this is the case you could actually also have a json file with the text in each of your html files.

For example, lets sat you have two html pages, foo.html and bar.html, you could then extract the relevant content from each of these and create a json file containing the following:

[{
    "id": "foo.html",
    "text": "whatever text is in foo.html"
},{
    "id": "bar.html",
    "text": "whatever text is in bar.html"
}]

This would live in the directory with your html in, e.g

- project_dir
-- foo.html
-- bar.html
-- index.json

Then you can use the index file with lunr.js.

How you go about actually building the index.json file depends on what tools you have available, although something like boilerpipe or readability. There are more options discussed here http://readwrite.com/2011/03/19/text-extraction

Upvotes: 0

frequent

Reputation: 28523

Altough I haven't tried to using it in this way, you could have a look at jIO (Github)

jIO can be used to manage and sync JSON documents across multiple storages (browser localstorage, webDav, xWiki, S3...). Storages can be indexed and jIO comes with it's own query module, called complexQueries, that can also be used standalone.

If you request pages via Ajax and extract the full text/HTML of your page, just dump it into jIO as a document.

Three ways to do (all examples from the jIO documentation):

1) Use plain localstorage and complex queries
Create a document in jIO for every document you want to have searchable. So after setting up your jIO:

var mySearchFiles = JIO.newJio({
    "type" : "local",
    "username" : "whatever",
    "application_name" : "fulltextsearch"
 });

add the full HTML/extracted text (localstorage size limit...) as a document like so:

mySearchFiles.put({
    "_id": "your_id",
    "search_result_string": "page_title/page_filename",
    "searchable_text": "your_text_to_be_searched_goes_here",
    function (err, response) {
        // console.log(response) =
        // {
        //  "ok": true,
        //  "id": "your_id",
        // }
    }

);

Either use the _id or another custom key as what you want to have returned from jIO when searching.

Then run complex queries on your jIO using the allDocs method (here is an example page to play around with complex queries):

// here you construct your basic query
var query_object = {
    "query":{
        "filter": {
            // records from/to to be returned
            "limit":[0,10],
            // sort direction
            "sort_on":[[search_result_string, "ascending"]],
            // what fields to return
            "select_list":[["search_result_string"]]
        },
        // wildchard
        "wildcard_character":'%'
    }
};

// build your query - if user entered the variable "search_term":
var search = "searchable_text: = %" + search_term + "%"; 

// add to query object
query_object.query.query = search;

// run the search
mySearchFiles.allDocs(
   query_object,
   function (err, response){
      console.log(response);
   }
);

This should return the search_result_string you want. I don't know how fast it will be on large texts to search, but you could write your own search grammar, if you want, using JSCC Parser Generator.

2. Use only complex queries
You can use the parse, serialize and query methods used in ComplexQueries standalone. Check out the examples page link as above on how it works.

Basically you need to have the data you want searched available as an object list and your query must be serialized. Then just call:

var result = jIO.ComplexQueries.query(query, object_list);

Of course, you would need some place to keep your searchable data, so I would probably go with localStorage alongside.

3. Add an indexStorage on top of localStorage
You can add an index on top of localStorage like so:

 mySearchFiles = JIO.newJio({
    "type": "indexed",
    "indices": [
        {"name":"index_name", "fields":["field_to_be_indexed_1"]},
        {"name":"index_name2", "fields":["field_to_be_indexed_1","field_to_be_indexed_2"]}
    ],
    "field_types": {
      "field_to_be_indexed_1": "string",
      "field_to_be_indexed_2": "string"
    },
    "sub_storage": {
      "type": "local",
      "username": "whatever",
      "application_name": "fulltextsearch"
    }
});

This will create an index for all documents you add to your localstorage, which would allow you to do a keyword search on the files before digging through all files using complexQueries for example. So:

mySearchFiles.put({
    "_id": "your_id",
    "search_result_string": "page_title/page_filename",
    "index_field": "keyword",
    "index_field2": "another_keyword",
    "searchable_text": "your_text_to_be_searched_goes_here",
    function (err, response) {
        // console.log(response) =
        // {
        //  "ok": true,
        //  "id": "your_id",
        // }
    }
  );

You can call the same methods, but JIO will always try to query the index first to build the results. Actually this is more for remote storage locations (search the index before HTTP-requesting files from say ... S3), but nevertheless, maybe usable.

Let me know if you have any question.

Upvotes: 1

Michael W

Reputation: 688

There is one way I could think of doing this - you could fetch the local files using XMLHttpRequest. This is not allowed by default, but for example chromium can be started with the following parameter:

--allow-file-access-from-files

You would have to go through all the files you want to look at and implement the search manually by stripping html tags and performing regex, that shouldn't be hard.

I tested the following code in chromium:

var xmlhttp = new XMLHttpRequest();
var url = "file:///your-file.html";

xmlhttp.open('GET', url, true);
xmlhttp.onerror = function(e) { console.log('Problems' + e); };

xmlhttp.onreadystatechange=function() {
    if (xmlhttp.readyState === 4 && xmlhttp.status === 0) {
        console.log("Fetched: ");
        console.log(xmlhttp.responseText);

    }
    if (xmlhttp.readyState === 4 && xmlhttp.status === 200) {
        // ....
    }
    else if (xmlhttp.readyState==4 && xmlhttp.status != 200) {
        // ....
    }
}

xmlhttp.send();

Upvotes: 2

Full text based Search on Client Machine using Java Script

Answers (3)

Related Questions