Nicola Cossu
Nicola Cossu

Reputation: 56407

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?

Thanks in advance.

Upvotes: 7

Views: 924

Answers (3)

Denis de Bernardy
Denis de Bernardy

Reputation: 78571

I've been using them successfully to aggregate recursive tree references using triggers.

For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.

One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.

with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;

Another is to pre-aggregate the needed data:

categories (
  id int,
  parents int[] -- (array_agg(parent_id) from parents) || id
)

products (
  id int,
  categories int[] -- array_agg(category_id) from product2category
)

index on categories using gin (parents)

index on products using gin (categories)

select products.*
from products
where categories && array(
      select id from categories where parents && array[1,2,3]
      )
order by ... limit 10;

One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.

To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:

select products.*
from products
where arrayoverlap(cat_id, array(
      select id from categories where arrayoverlap(parents, array[1,2,3])
      ))
order by ... limit 10;

(The same goes for the <@ operator...)

Upvotes: 1

mu is too short
mu is too short

Reputation: 434965

I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:

SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.

PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.

The above approach gets you from the database to the rendered page with one pass through the data.

PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.

Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.

Upvotes: 4

One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.

Upvotes: 1

Related Questions