Reputation: 68
I have distribution_order and distribution_order_items tables with following schema.
CREATE TABLE public.distribution_order
(
distributionorderid bigint NOT NULL DEFAULT nextval('distribution_order_distributionorderid_seq'::regclass),
sourcelocationid character varying(20) COLLATE pg_catalog."default" NOT NULL,
destinationlocationid character varying(20) COLLATE pg_catalog."default" NOT NULL,
distributionorderheader jsonb NOT NULL,
status character varying(25) COLLATE pg_catalog."default" NOT NULL,
createdtimestamp timestamp without time zone NOT NULL,
lastupdatedtimestamp timestamp without time zone,
CONSTRAINT distribution_order_pkey PRIMARY KEY (distributionorderid)
)
---------------------------------------------------
CREATE TABLE public.distribution_order_item
(
distributionorderid bigint NOT NULL,
packid character varying(50) COLLATE pg_catalog."default" NOT NULL,
status character varying(25) COLLATE pg_catalog."default" NOT NULL,
itemdata jsonb NOT NULL,
createdtimestamp timestamp without time zone NOT NULL,
lastupdatedtimestamp timestamp without time zone,
CONSTRAINT uq_distribution_order_item UNIQUE (distributionorderid, packid),
CONSTRAINT fk_distributionorderid FOREIGN KEY (distributionorderid)
REFERENCES public.distribution_order (distributionorderid) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
distribution_order.distributionorderheader jsonb schema:
{
"orderType": "stock",
"destinationLocationId": "1008",
"orderIds": [
"e63b9007-dcaa-4c33-bd1d-e5fbcced0913"
],
"sourceLocationId": "1002",
"deliveryDate": "2018-07-13T23:00:00.000Z",
"orderedDate": "2018-07-11T17:00:00.000Z"
}
distribution_order_item.itemdata jsonb schema:
{
"packId": "100003366",
"orderedQuantity": 0,
"pickedInfo": [
{
"pickDate": "2019-04-17T10:05:36Z",
"lotNumber": "20191218",
"quantityType": "CS",
"containerId": "10000000000000014290",
"quantity": "4.000000"
}
]
"shipInfo": [
{
"shippedDate": "2019-04-17T10:05:36Z",
"lotNumber": "20191219",
"quantityType": "CS",
"containerId": "33333",
"shippedQuantity": "4.000000"
}
]
}
ItemData under distrubtion_order_item table will be updated on each shipment received with additional jsonb bock under itemdata.shipInfo .I have a requirement to calculate shipped quantity by adding 'shipedQuantity' under shipInfo , pickedQuantity by adding quantity under itemdata.pickInfo .Query I have written is as follows which is long running, any help to optimize it will be highly appreciated.
WITH D_ORD AS (
SELECT distributionorderid,
destinationlocationid,
distributionorderheader->>'orderedDate' as od,
distributionorderheader->>'deliveryDate' as dd
FROM distribution_order
WHERE distributionorderheader->>'deliveryDate' > '2019-04-23 17:09:46.46' ),
D_ORD_ITEMS AS (
SELECT
d.distributionorderid,
i.itemdata->>'itemId' AS itemid,
d.od::TIMESTAMP AS ordereddate,
d.destinationlocationid,
i.packid AS unitid,
i.itemdata->>'orderedQuantity' orderedquantity,
CASE WHEN i.status='SHIPPED' OR i.status='COMPLETE'
THEN CASE WHEN (i.itemdata->>'orderedQuantity')::float - sum((s->>'shippedQuantity')::float) >= 0
THEN (i.itemdata->>'orderedQuantity')::float - sum((s->>'shippedQuantity')::float)
ELSE 0
END
ELSE CASE WHEN i.status='PICKED'
THEN CASE WHEN (i.itemdata->>'orderedQuantity')::float - sum((p->>'quantity')::float) >= 0
THEN (i.itemdata->>'orderedQuantity')::float - sum((p->>'quantity')::float)
ELSE 0
END
ELSE 0
END
END AS cancelorderquantity,
CASE WHEN (CASE WHEN i.status='SHIPPED' OR i.status='COMPLETE'
THEN (i.itemdata->>'orderedQuantity')::float - sum((s->>'shippedQuantity')::float)
ELSE 0
END >0)
THEN CURRENT_DATE::TIMESTAMP::text
ELSE CASE WHEN (CASE WHEN i.status='PICKED'
THEN (i.itemdata->>'orderedQuantity')::float - sum((p->>'quantity')::float)
ELSE 0
END >0)
THEN CURRENT_DATE::TIMESTAMP::text
ELSE ''
END
END AS cancelleddate,
CASE WHEN (sum((s->>'shippedQuantity')::float) > 0 OR sum((p->>'quantity')::float) >0)
THEN CURRENT_DATE::TIMESTAMP::text
ELSE ''
END AS arrivedate ,
CASE WHEN extract(HOUR FROM (d.dd)::TIMESTAMP) =23
THEN ((d.dd::TIMESTAMP + interval '1 DAY')::date + interval '6 hour')
ELSE d.dd::TIMESTAMP::date + interval '6 hour'
END AS exp_av,
CASE WHEN sum((s->>'shippedQuantity')::float) >0
THEN sum((s->>'shippedQuantity')::float)
ELSE CASE WHEN sum((p->>'quantity')::float)>0
THEN sum((p->>'quantity')::float)
ELSE 0
END
END AS receivedquantity
FROM D_ORD d, distribution_order_item i LEFT JOIN jsonb_array_elements(i.itemdata->'shipmentInfo') s ON TRUE
LEFT JOIN jsonb_array_elements(i.itemdata->'pickedInfo')p ON TRUE
GROUP BY d.distributionorderid,d.destinationlocationid, i.itemdata->>'itemId', d.od::TIMESTAMP, i.packid,
i.itemdata->>'orderedQuantity',
i.itemdata->>'packSize',
i.status,
d.dd::TIMESTAMP)
SELECT * from D_ORD_ITEMS
Upvotes: 0
Views: 48
Reputation: 39393
If the JSON's schema is well-defined, don't use JSON. Refactor your table to use normal fields instead of JSON, then create index on performance-critical fields. JSON fields are good for loosely-defined schema, like the patients' records
When designing a database to use normal fields vs JSON fields, walk away from the database first. Think of what application you would use if there is no RDBMS to save your data.
If you reach for Excel to save your data, that is your data model tend to be tabular..
action product qty
received mouse 3
sold mouse 2
received keyboard 26
..then just use normal fields, don't go fancy using JSON fields.
If you tend to reach for Word (or even OneNote or Notepad) instead of Excel to save your data (e.g., patients' records), then that's a good indicator that your data model is a loosely-defined one, which can be facilitated by JSON, then by all means use JSON. As not all patients have similar details of records to keep, it will be hard if not impossible to to add a new field(s) to table every time there's a new detail that needs to be recorded; so yeah, use JSON instead.
Patient: John Doe
Intake Criteria:
Dm Dx Date: 2012/03/12
Initial Hgb A1c: 6.5
Co Morbid: Hypertension, CAD
Labs:
LDL Cholestrol:
LDL Level: 122.5,
LDL Result Date: 2012/07/06
Serum Creatinine:
CreatinineLevel: 1.4
Creatinine Result Date: 12/07/08
------
Patient: Jane Doe
Blood Type: DB
CareLocation:
FacilityName: East Side Clinic
ContactEmail: [email protected]
Weight: 60kg
If your data tend to be document-based (think OneNote, Word, etc), use JSON. If your data tend to be sheet-based (that is, tabular, think Excel), don't use JSON, use normal fields instead. Normal fields are natural for RDBMS to access, and natural for RDBMS to create an index on.
If refactoring the database to convert your JSON properties to normal fields can't be accommodated in your current development schedule, you can make your JSON properties be treated as "normal fields" by creating an index on those properties.
Create an index on well-defined data, these well-defined data are the ones that are usually accessed and/or are always present in your JSON data, most of those well-defined data can be found on GROUP BY or WHERE clauses.
It's fortunate that Postgres can create an index on expression, this means you can also create an index on a JSON expression (e.g., itemdata->>'itemId'
). If you've used other RDBMS that can't create an index on expression, you just painted yourself at the corner, as execution plans on queries that uses JSON will always resolve to sequential scan of table instead of index scan.
It's still a good design to use normal fields for data instead of JSON properties if those data are well-defined, regularly used. Factor in the refactoring of your database design to your development schedule, convert those JSON fields to normal fields later.
To cut to the chase, here's a proof-of-concept that shows your query that involves JSON can still benefit from the performance associated with normal fields by creating an index on regularly-used JSON properties.
create table tbl
(
id int generated by default as identity primary key,
-- on older version of Postgres, use this instead:
-- id serial primary key
document jsonb not null
);
create index ix_tbl__document_code on tbl(((document->>'code')::int));
insert into tbl(document)
select
jsonb_build_object
('name', 'Slim',
'nick', 'Shady',
'category', 'Musician',
'ack', x.i,
'code', x.i
)
from generate_series(1,500000) as x(i)
Here's the execution plan when there's an index on (document->>'code')::int
expression:
explain analyze
select * from tbl
where (document->>'code')::int = 42;
Here's the execution plan if there is no index created on expression:
explain analyze
select * from tbl
where (document->>'ack')::int = 42;
0.018 milliseconds vs 52.335 milliseconds
Upvotes: 1