Krøllebølle
Krøllebølle

Reputation: 3028

Filter out unique data from PostgreSQL query result in Javascript

I have a PostgreSQL database from which I query data using node.js with the pg module. The resulting array in my Javascript code (on the server side) contains lots of duplicate values because that's the way the data is returned. I believe the query is ok, so the values are duplicated because of multiple INNER JOINS with tables that are independent of each other, so that a result of A and B from one table (qa_layers in this case) with a result of C and D from another table (qa_cases in this case), together with a common value E from a third table (qa_settings in this case )will be represented as

| settings_name | layer_name | case_name |
| E             | A          | C         |
| E             | A          | D         |
| E             | B          | C         |
| E             | B          | D         |

As far as I know, this should be the expected behavior. What I need to do is to filter out the unique data for each column as arrays with unique values, e.g. [A, B] and [C, D] for layer_name and case_name. In my mind there could be several ways to achieve this, but each of them seems a little painful, unless there is some helper library or functionality I don't know about.

  1. Parse the resulting array from the database in Javascript. This seems painful to do because of multiple for loops, but it might be easy if there is some library that can do what I need.
  2. Enhance the PostgreSQL query so that no or less parsing is needed. I don't know if this is possible to do, but if the resulting array had easier parsable objects in them the parsing could be less painful. My query is pasted at the end.
  3. Perform multiple queries on the database to get the proper data. For me, this seems like the least scalable solution and the overhead would be a lot less when parsing an array instead of running multiple, consecutive queries.

My PostgreSQL query:

SELECT run.id, settings.name AS settings_name, layer.name AS layer_name, qa_case.name AS case_name FROM qa_runs AS run
    INNER JOIN qa_composites_in_run AS cr
        ON cr.run_id = run.id
    INNER JOIN qa_layers_in_composite AS lc
        ON lc.composite_name = cr.composite_name
    INNER JOIN qa_layers AS layer
        ON layer.name = lc.layer_name
    INNER JOIN qa_cases_in_run AS case_run
        ON case_run.run_id = run.id
    INNER JOIN qa_cases AS qa_case
        ON qa_case.name = case_run.case_name
    INNER JOIN qa_settings AS settings
        ON settings.name = run.settings_name
WHERE run.id IN (27,28,29);

My actual resulting Javascript array:

[
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitLondon"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitRotterdam"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "wtf"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitLondon"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitRotterdam"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "wtf"
    },
    {
        "id": 28,
        "settings_name": "QA Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitLondon"
    },
    {
        "id": 28,
        "settings_name": "QA Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "wtf"
    },
    {
        "id": 27,
        "settings_name": "QA Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitLondon"
    },
    {
        "id": 27,
        "settings_name": "QA Default",
        "layer_name": "OpenStreetMapService",
        "case_name": "VisitRotterdam"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "VisitLondon"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "VisitRotterdam"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "wtf"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "VisitLondon"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "VisitRotterdam"
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_name": "Map2D",
        "case_name": "wtf"
    },
    {
        "id": 28,
        "settings_name": "QA Default",
        "layer_name": "Map2D",
        "case_name": "VisitLondon"
    },
    {
        "id": 28,
        "settings_name": "QA Default",
        "layer_name": "Map2D",
        "case_name": "wtf"
    },
    {
        "id": 27,
        "settings_name": "QA Default",
        "layer_name": "Map2D",
        "case_name": "VisitLondon"
    },
    {
        "id": 27,
        "settings_name": "QA Default",
        "layer_name": "Map2D",
        "case_name": "VisitRotterdam"
    }
]

How I want the array/JSON to be in the end:

[
    {
        "id": 27,
        "settings_name": "QA Default",
        "layer_names": [
            {
                "layer_name": "OpenStreetMapService"
            },
            {
                "layer_name": "Map2D"
            }
        ],
        "case_names": [
            {
                "case_name": "VisitLondon"
            },
            {
                "case_name": "VisitRotterdam"
            }
        ]
    },
    {
        "id": 28,
        "settings_name": "QA Default",
        "layer_names": [
            {
                "layer_name": "OpenStreetMapService"
            },
            {
                "layer_name": "Map2D"
            }
        ],
        "case_names": [
            {
                "case_name": "VisitLondon"
            },
            {
                "case_name": "wtf"
            }
        ]
    },
    {
        "id": 29,
        "settings_name": "Test Default",
        "layer_names": [
            {
                "layer_name": "OpenStreetMapService"
            },
            {
                "layer_name": "Map2D"
            }
        ],
        "case_names": [
            {
                "case_name": "VisitLondon"
            },
            {
                "case_name": "VisitRotterdam"
            },
            {
                "case_name": "wtf"
            }
        ]
    }
]

Upvotes: 0

Views: 694

Answers (1)

Craig Ringer
Craig Ringer

Reputation: 324551

I suggest doing this on the PostgreSQL side, by constructing and returning a json object with your nested data instead of returning a join relation.

You can do this fairly easily with the json support in 9.3 and above. Without sample data and schema I can't really convert your query for you, but you'll want to start with json_agg and row_to_json. Have a look for other related questions with answers referencing those functions too.

Here's a simple example:

CREATE TABLE parent(
    id integer primary key,
    parentdata text
);

CREATE TABLE child(
    id integer primary key,
    parent_id integer not null references parent(id),
    childdata text
);

INSERT INTO parent(id, parentdata) VALUES 
(1, 'p1'), (2, 'p2'), (3, 'p3');

INSERT INTO child(id, parent_id, childdata)
VALUES
(10, 1, 'c1_10'),
(20, 2, 'c2_20'),
(21, 2, 'c2_21');

SELECT row_to_json(toplevel, true)
FROM (
  SELECT p.id, p.parentdata, json_agg(c) AS children 
  FROM parent p 
  LEFT OUTER JOIN child c ON (p.id = c.parent_id) GROUP BY p.id
) toplevel;

which emits:

{"id":1,
 "parentdata":"p1",
 "child":[{"id":10,"parent_id":1,"childdata":"c1_10"}]}
{"id":2,
 "parentdata":"p2",
 "child":[{"id":20,"parent_id":2,"childdata":"c2_20"}, 
 {"id":21,"parent_id":2,"childdata":"c2_21"}]}
{"id":3,
 "parentdata":"p3",
 "children":[null]}

If you had to de-duplicate, you'd probably want to do so by ensuring that your ORDER BY clause in the SQL was fully specified, so that the rows were ordered from outer to inner object. Then you could scan the table linearly in js and ignore rows where the id for an object is the same as the id for the same entity in the previous row. It's a pretty simple single for loop with some state tracking variables. The downside is that PostgreSQL has to sort the result set.

Upvotes: 1

Related Questions