VJAI
VJAI

Reputation: 32768

Search algorithm based on word/tag match, last access and frequency

I'm working on a small mobile app that is used to store secrets. The secrets are of different types: simple (plain text), passwords and images. Each secret is connected through one or many by tags. I've a search textbox in the home page where user can type some text for searching the secrets.

At a simple level, I can search the persisted secrets based on string match over description or tag. It makes sense the ones that matches by description has higher rank than the ones by tag. But, I've to consider couple of other factors: last access and frequency of access. I'm puzzled how these two factors affect the match.

Is there any data structure/algorithm exists to help me to sort matching entities based on description, tag, last access and frequency of access?

Upvotes: 0

Views: 214

Answers (1)

Jim Mischel
Jim Mischel

Reputation: 134005

If I understand you correctly, you want to do the search by word and tag match to obtain a list of candidates from which you will select the "best" items. Your question indicates that you will favor an exact match on the description (word?) over a tag match. Now you want to know how you'd take frequency of access and last access time into account.

You don't need a particular data structure for this purpose. Any list you can sort will work just fine. The trick is coming up with a comparison function that will take those things into account. The way the comparison function works is up to you.

The simplest comparison function would be a simple ordering based on four criteria: word match, tag match, last access, and frequency. It would look something like:

// returns 1 if item1 > item2.
// returns -1 if item1 < item2
// returns 0 if item1 == item2
int compare(item1, item2)
{
    if (item1.wordMatch && !item2.wordMatch) return 1;
    if (item2.wordMatch && !item1.wordMatch) return -1;
    // do the same with tag match
    // then check last access
    if (item1.lastAccess > item2.lastAccess) return 1;
    if (item1.lastAccess < item2.lastAccess) return -1;
    // and check access frequency
    if (item1.freq > item2.freq) return 1;
    if (item1.freq < item2.freq) return -1;
    // everything's the same
    return 0;
}

You might instead want to compute a "score" for each item. For example, a word match is worth 10 points and a tag match is worth, say 4 points. So an item that has three tag matches would have a score of 12, ranking it higher than an item that has a single exact word match.

How you quantify last access time and access frequency is up to you. You'll want to think about how important each of those things is. Should something that is accessed infrequently but was last accessed 30 seconds ago be ranked higher or lower than something that's accessed very frequently but hasn't been accessed at all for the last hour? Only you can decide how important each of those criteria is.

Once you've come up with a way of computing the score for each item, your comparison function is quite simple.

Whatever you do will require some tuning. One way to start would be something like:

10 points for an exact word match
 4 points for a tag match
subtract .01 points for every minute since the last access time, up to a maximum of 8 points.
add .01 points for each prior access (i.e. frequency count), up to a maximum of 8 points.

I'll be honest that the above is just a wild guess at something that might give reasonable results. The point is to come up with something and try it. Then do some tuning. Perhaps try other things. But the basic idea is to come up with a way to compute a score based on those four criteria.

Upvotes: 2

Related Questions