crackerplace
crackerplace

Reputation: 5475

How to achieve DISTINCT option in a tarantool query

Articles are parsed from an rss feed and each article might fall into many categories.Also each article has some metadata such as source,upstream etc.

Below is how we are designing the spaces.Each article is inserted into the articles space.


articles space

urlhash | article.content
abcdef | { dummy content}

primary key urlhash = hash(article.url).


In the category_articles space we insert the article multiple times based on how many categories it falls into

category_articles

source | category | urlhash | timestamp
bbc | arts | article1 | 27777
bbc | mobile | article8 | 27777
bbc | phone | article3 | 27778
nyt | sound | article7 | 36667
nyt | speaker | article7 | 45556

primary key = {source, category, urlhash}
secondary key = {source, category, timestamp}

I need latest articles for a given source and a possible category.Below is how I framed the query.

box.space.category_articles.index.secondary:select{{'nyt','speaker'},{ iterator = 'LE', limit = 5 }}

Now I will get article7 twice in the result.Currently I am filtering duplicate results in the code.How can I have distinct(urlhash) kind of option in tarantool.

Upvotes: 0

Views: 390

Answers (2)

crackerplace
crackerplace

Reputation: 5475

I was able to find a better solution using the pairs function on the index and then filter the articles(track the unique ones using a lua table) until I get the unique number of articles.

index_object:pairs([key[, iterator-type]])

Upvotes: 1

Vasiliy Soshnikov
Vasiliy Soshnikov

Reputation: 504

Where is two possible options

  1. The first one is filtering everything at the client side.
  2. The second one is using Lua stored procedure. An example:

    function select_with_distinct() local ca = box.space.category_articles for _, v in pairs(ca.index.secondary:select{ {'nyt','speaker'},{ iterator = 'LE', limit = 5 }}) do -- filtring ... end end

Upvotes: 0

Related Questions