freddy
freddy

Reputation: 17

Best practice for storing and later accessing JSON object in rails database

I'm creating a Ruby-on-rails where photos can be uploaded, analysed and tagged. We're using Azure's Cognitive Services which can e.g. return a list of tags associated with a given image.

An example of tags for one image in JSON format is:

{"tags":[
{"tag_name":"outdoor","tag_confidence":"99.87"},
{"tag_name":"clothing","tag_confidence":"99.56"},
{"tag_name":"person","tag_confidence":"99.55"},
{"tag_name":"human face","tag_confidence":"93.33"},
{"tag_name":"tree","tag_confidence":"93.17"},
{"tag_name":"smile","tag_confidence":"91.33"}}

Right now we're just storing this as a hash in the table, but later we'd like to be able to do filtering or queries, where we can e.g. get all images with the person-tag. How do I go about creating the model and the tables for this in the best way?

Upvotes: 0

Views: 309

Answers (1)

max
max

Reputation: 101811

If you're planning on querying the data and it has a regular structure then you might be better served by using actual tables.

# rails g model tag name:string:uniq
class Tag < ApplicationRecord
  validate_uniqueness_of :name
  has_many :taggings
  has_many :photos, through: :taggings
end

# rails g model tagging tag:belongs_to photo:belongs_to confidence:float
# add a unique index on tag_id and photo_id
class Tagging < ApplicationRecord
  validate_uniqueness_of :tag_id, scope: :photo_id
  belongs_to :tag
  belongs_to :photo
end

class Photo < ApplicationRecord
  has_many :taggings
  has_many :tags, through: :taggings
end
class TaggingService
  def initialize(photo)
    @photo = photo
    @tag_data = get_it_somehow(photo)
  end

  def perform(photo)
    @tag_data.each do |raw|
      photo.taggings.create(
        confidence: raw["tag_confidence"],
        tag: Tag.find_or_create_by(name: raw["tag_name"])
      )
    end
  end

  def self.perform(photo)
    new(photo).perform
  end
end

You can then for example query for a photo with a given tag just by:

Photo.joins(:tags)
     .where(tags: { name: 'outdoor' })

Or a set of tags:

Photo.left_joins(:tags)
     .group(:id)
     .where(tags: { name: ['outdoor', 'clothing', 'tree'] })
     .having('COUNT(tags.*) >= 3')

Sure you can do this with a JSON column but the queries are going to be much more difficult to read and you lose all the advantages to having database normalization and having actual models to work with.

One example of this is if you for example want to display tags based on aggregates:

Tag.left_joins(:taggings)
   .order('COUNT(taggings.*)', 'AVG(taggings.confidence)')

If you used a JSONB column you would have to extract that from every row on the photos table with a nightmarish query.

Upvotes: 2

Related Questions