aleherzko
aleherzko

Reputation: 90

Query Array of Arrays Of Hashes using Postgres

I have a model that has a column that looks like this (example):

column = [{:A => "1", :B => "2"}, [a, b, c, ..., n]]

Where a, b, c and n are also hashes, like this (example):

a = {:X => "x", :Y => "y"}

Currently, if I call a record m of the Model, for example, m.column[0][:A] it returns => "1"
And if I call m.column[1][0] I get a = {:X => "x", :Y => "y"}

So far, so good.

Now the question is, how can I get an Array of all the records of the Model that has, for example, column[0][:A] = "1" ???

I'm trying something like this (but it isn't working):

Model.where("column[0][:A] = ?", "1")

The error says:

PG::SyntaxError: ERROR:  syntax error at or near ":"

EDIT:

column is a text datatype
and in its model it has serialize :column, Array

Upvotes: 2

Views: 3105

Answers (1)

mu is too short
mu is too short

Reputation: 434975

Your problem is that you're using serialize to store your data and serialize just slops a big pile of unqueryable YAML into the database. Parsing YAML inside PostgreSQL is certainly possible but it would be slow and a little pointless.

If you really need to store an array of hashes then PostgreSQL's jsonb type would be your best bet. You won't get symbol keys in the database but you will get string keys so this Ruby data:

[{:A => "1", :B => "2"}, {:C => 6, :D => 11}]

would look like this JSON:

[ { "A": "1", "B": "2" }, { "C": 6, "D": 11 } ]

inside the database.

Once you have jsonb in the database, you can query it using all the JSON functions and operators that PostgreSQL providers and you could even index the JSON to support faster querying. In your case, your query would look like:

Model.where("column->0->>'A' = ?", '1')

the -> operator with an integer on the RHS acts like Ruby's Array#[] method:

-> int
Get JSON array element (indexed from zero, negative integers count from the end)

With a string on the RHS it acts like Ruby's Hash#[]. Note that -> returns JSON, not text or integer. The ->> operator is the same as -> but it returns text so you'd use that at the end to make the comparison cleaner.

Alternatively you could say:

Model.where("column->0->'A' = ?", '1'.to_json)

to push the string-vs-JSON logic into Ruby.


Everything is pretty easy once your database schema makes sense (i.e. it is using PostgreSQL's jsonb instead of Rails's serialize) but how do you get there from here? First remove the serialize :column, Array from your model, then you'd need a three step migration something like this:

  1. Add a jsonb column:

    def change
      add_column :models, :column_j, :jsonb
    end
    
  2. Read each column value from the database, unpack the YAML by hand, and write the JSON by hand:

    def up
      connection.execute('select id, column from models').each do |row|
        a = YAML.load(row['column'])
        connection.raw_connection.exec(
          'update models set column_j = $1 where id = $2',
          [ a.to_json, row['id'].to_i ]
        )
      end
    end
    

    Note that you can't use the model for this as the model class and the database no longer agree about the structure and format of the models table.

  3. Replace the old column with the new column_j:

    def change
      remove_column :models, :column, :text
      rename_column :models, :column_j, :column
    end
    

You'll want to backup your database before migrating of course. And hopefully you'll never even think about using serialize again, serialize is a horrible kludge that seems like an easy solution but quickly turns into an RPG aimed right at your foot.

Upvotes: 6

Related Questions