ironsand
ironsand

Reputation: 15181

How to make ActiveRecord query unique by a column

I have a Company model that has many Disclosures. The Disclosure has columns named title, pdf and pdf_sha256.

class Company < ActiveRecord::Base
  has_many :disclosures
end

class Disclosure < ActiveRecord::Base
  belongs_to :company
end

I want to make it unique by pdf_sha256 and if pdf_sha256 is nil that should be treated as unique.

If it is an Array, I'll write like this.

companies_with_sha256 = company.disclosures.where.not(pdf_sha256: nil).group_by(&:pdf_sha256).map do |key,values|
  values.max_by{|v| v.title.length}
end
companies_without_sha256 = company.disclosures.where(pdf_sha256: nil)
companies = companies_with_sha256 + companeis_without_sha256

How can I get the same result by using ActiveRecord query?

Upvotes: 5

Views: 4948

Answers (6)

Shovon
Shovon

Reputation: 71

If you need a relation with distinct pdf_sha256, where you require no explicit conditions, you can use group for that -

scope :unique_pdf_sha256, -> { where.not(pdf_sha256: nil).group(:pdf_sha256) }
scope :nil_pdf_sha256,    -> { where(pdf_sha256: nil) }

You could have used or, but the relation passed to it must be structurally compatible. So even if you get same type of relations in these two scopes, you cannot use it with or.

Edit: To make it structurally compatible with each other you can see @AlexSantos 's answer

Upvotes: 0

Ernest
Ernest

Reputation: 66

Assuming you are using Rails 5 you could chain a .or command to merge both your queries.

pdf_sha256_unique_disclosures = company.disclosures.where(pdf_sha256: nil).or(company.disclosures.where.not(pdf_sha256: nil))

Then you can proceed with your group_by logic.

However, in the example above i'm not exactly sure what is the objective but I am curious to better understand how you would use the resulting companies variable.

If you wanted to have a hash of unique pdf_sha256 keys including nil, and its resultant unique disclosure document you could try the following:

sorted_disclosures = company.disclosures.group_by(&:pdf_sha256).each_with_object({}) do |entries, hash|
  hash[entries[0]] = entries[1].max_by{|v| v.title.length}
end

This should give you a resultant hash like structure similar to the group_by where your keys are all your unique pdf_sha256 and the value would be the longest named disclosure that match that pdf_sha256.

Upvotes: 1

Alex Santos
Alex Santos

Reputation: 2950

It is possible to do it in one query by first getting a different id for each different pdf_sha256 as a subquery, then in the query getting the elements within that set of ids by passing the subquery as follows:

def unique_disclosures_by_pdf_sha256(company)
  subquery = company.disclosures.select('MIN(id) as id').group(:pdf_sha256)
  company.disclosures.where(id: subquery)
    .or(company.disclosures.where(pdf_sha256: nil))
end

The great thing about this is that ActiveRecord is lazy loaded, so the first subquery will not be run and will be merged to the second main query to create a single query in the database. It will then retrieve all the disclosures unique by pdf_sha256 plus all the ones that have pdf_sha256 set to nil.

In case you are curious, given a company, the resulting query will be something like:

SELECT  "disclosures".* FROM "disclosures" 
WHERE (
  "disclosures"."company_id" = $1 AND "disclosures"."id" IN (
    SELECT MAX(id) as id FROM "disclosures" WHERE "disclosures"."company_id" = $2 GROUP BY "disclosures"."pdf_sha256"
  ) 
  OR "disclosures"."company_id" = $3 AND "disclosures"."pdf_sha256" IS NULL
)

The great thing about this solution is that the returned value is an ActiveRecord query, so it won't be loaded until you actually need. You can also use it to keep chaining queries. Example, you can select only the id instead of the whole model and limit the number of results returned by the database:

unique_disclosures_by_pdf_sha256(company).select(:id).limit(10).each { |d| puts d }

Upvotes: 2

jaspreet-singh-3911
jaspreet-singh-3911

Reputation: 211

You can achieve this by using uniq method

 Company.first.disclosures.to_a.uniq(&:pdf_sha256)

This will return you the disclosures records uniq by cloumn "pdf_sha256"

Hope this helps you! Cheers

Upvotes: 2

NM Pennypacker
NM Pennypacker

Reputation: 6952

Why not:

ids = Disclosure.select(:id, :pdf_sha256).distinct.map(&:id)
Disclosure.find(ids)

The id sill be distinct either way since it's the primary key, so all you have to do is map the ids and find the Disclosures by id.

Upvotes: 0

user8732191
user8732191

Reputation:

Model.select(:rating)

Result of this is an array of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:

Model.select(:rating).map(&:rating).uniq

or this (most efficient)

Model.uniq.pluck(:rating)

Model.distinct.pluck(:rating)

Update

Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).

Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']

In this case, deduplicate after the query

user.addresses.pluck(:city).uniq # => ['Moscow']

Upvotes: -1

Related Questions