tiimgreen
tiimgreen

Reputation: 11

Returning json array/object of nested data

I have a postgres database of jobs, locations and countries, where jobs <=> locations is many-to-many, jobs <=> countries is many-to-many, and locations <= countries is one-to-many. A job can belong to a country through 2 different paths:

jobs <=> jobs_locations <=> locations <=> countries
jobs <=> jobs_countries <=> countries

This is my current query:

select
  jobs.*,
  COALESCE(json_agg(locations) filter (where locations.id IS NOT NULL), '[]') as locations,
  COALESCE(json_agg(countries) filter (where countries.id IS NOT NULL), '[]') as countries
from jobs
left join jobs_locations on jobs.id = jobs_locations.job_id
left join locations on jobs_locations.location_id = locations.id
left join countries loc_country on locations.country_id = loc_country.id
left join jobs_countries on jobs.id = jobs_countries.job_id
left join countries on jobs_countries.country_id = countries.id
where loc_country.id = 2 OR countries.id = 2
group by jobs.id;

I'm currently getting the data in this format:

{
  id: 1,
  ...
  locations: [
    {
      id: 10,
      name: "Paris",
      country_id: 100
    }
  ]
}

I want to get a job in this format:

{
  id: 1,
  ...
  locations: [
    {
      id: 10,
      name: "Paris",
      country: {
        id: 100,
        name: "France"
      }
    }
  ]
}

How can I do this? It seems like a simple query but I've been trying for hours to get this data in the right format, what am I missing here?

Upvotes: 1

Views: 46

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658947

Many ways lead to Rome.

This returns all jobs, and only appends locations and/or countries for country_id = 2. Should be among the fastest ways:

SELECT j.*
     , coalesce(l.locations, '[]') AS locations
     , coalesce(c.countries, '[]') AS countries
FROM   jobs j
LEFT   JOIN (
   SELECT jl.job_id AS id
        , jsonb_agg(l) - 'country_id' AS locations  -- !!!
   FROM   jobs_locations jl
   JOIN  (
      SELECT l.*, to_jsonb(c) AS country            -- subquery to prepare row !!!
      FROM   locations l
      JOIN   countries c ON c.id = l.country_id
      WHERE  l.country_id = 2
      ) l ON l.id = jl.location_id
   GROUP  BY 1
   ) l USING (id)
LEFT   JOIN (
   SELECT jc.job_id AS id
        , jsonb_agg(c) AS countries
   FROM   jobs_countries jc
   JOIN   countries c ON c.id = jc.country_id
   WHERE  jc.country_id = 2
   GROUP  BY 1
   ) c USING (id);

First of, your original seems incorrect to begin with. Multiple uncorrelated LEFT JOIN instances result in multiplication of rows. See:

And then you add a WHERE condition on "right" tables, which counteracts all the LEFT JOIN clauses ... See:

I fixed that before adding an answer to your question. Note how I use jsonb instead of json to allow convenient operators. Details for that solution:

One side-effect: keys appear in jsonb-specific sort order.

Only get jobs with countries and/or locations

At least one of the of the joins must find a row. (That's how I understand your comment.)

You could simply append your (adapted) original WHERE clause in the outer SELECT:

...
WHERE 2 IN (l.id, c.id);

But try this more elegant rewrite with a FULL JOIN between locations and countries, followed by a JOIN to jobs. If the subqueries are selective (like it seems), that should be (much) faster because it can filter jobs right away, during the join on the PK column id:

SELECT j.*
     , coalesce(l.locations, '[]') AS locations
     , coalesce(c.countries, '[]') AS countries
FROM  (
   SELECT jl.job_id AS id
        , jsonb_agg(l) - 'country_id' AS locations
   FROM   jobs_locations jl
   JOIN  (
      SELECT l.*, to_jsonb(c) AS country
      FROM   locations l
      JOIN   countries c ON c.id = l.country_id
      WHERE  l.country_id = 2
      ) l ON l.id = jl.location_id
   GROUP  BY 1
   ) l
FULL   JOIN (
   SELECT jc.job_id AS id
        , jsonb_agg(c) AS countries
   FROM   jobs_countries jc
   JOIN   countries c ON c.id = jc.country_id
   WHERE  jc.country_id = 2
   GROUP  BY 1
   ) c USING (id)
JOIN   jobs j USING (id);

You should be comfortable with all join clauses and NULL handling to understand subtleties. In particular, matching column aliases (id) and the USING clause in the FULL [OUTER] JOIN are essential to produce the single, combined id column for the subsequent [INNER] JOIN. This is one of the rare cases where the USING clause is essential.
(Well, we could spell out JOIN jobs j ON j.id = COALESCE(l.id, c.id) but that's far less elegant.)

Upvotes: 1

Related Questions