Reputation: 123
I have a parent
and child
table as following:
create table parent
(
identifier serial primary key,
name text
);
create table child
(
identifier serial primary key,
name text, parent_identifier integer references parent
);
I created two utility functions to format a parent
and child
row to a JSON object:
create function format(child) returns json
as $$
select json_build_object('identifier', $1.identifier, 'name', $1.name)
$$ language sql stable;
create function format(parent) returns json
as $$
select json_build_object('identifier', $1.identifier, 'name', $1.name,
'children', array(select format(child) from child where parent_identifier = $1.identifier))
$$ language sql stable;
Let's test this:
insert into parent(name) values('first parent');
insert into parent(name) values('second parent');
insert into child(name, parent_identifier) values('first child first parent', (select identifier from parent where name = 'first parent'));
insert into child(name, parent_identifier) values('second child first parent', (select identifier from parent where name = 'first parent'));
insert into child(name, parent_identifier) values('first child second parent', (select identifier from parent where name = 'second parent'));
select format(parent) from parent;
This returns the following JSON objects:
{
"identifier":5,
"name":"first parent",
"children":[
{
"identifier":7,
"name":"first child first parent"
},
{
"identifier":8,
"name":"second child first parent"
}
]
}
{
"identifier":6,
"name":"second parent",
"children":[
{
"identifier":9,
"name":"first child second parent"
}
]
}
Great! Though, this has one big problem: if another transaction does some changes in between our insert
and select
queries, the select
query does exactly return what we just inserted. We can fix this by setting the transaction isolation level to repeatable read
, but that has its performance costs and other drawbacks (we might have to retry).
So I thought about rewriting the queries above in one single CTE. If I am not mistaken, this will not suffer from such concurrency issues. I started as following:
with
parents as
(
insert into parent(name)
select 'first parent'
union all
select 'second parent'
returning parent.identifier, parent.name
),
children as
(
insert into child(name, parent_identifier)
select 'first child first parent', identifier from parents where name = 'first parent'
union all
select 'second child first parent', identifier from parents where name = 'first parent'
union all
select 'first child second parent', identifier from parents where name = 'second parent'
)
select format(parents::parent) from parents;
This does not work as expected. It returns the following JSON objects:
{
"identifier":7,
"name":"first parent",
"children":[]
}
{
"identifier":8,
"name":"second parent",
"children":[]
}
As you can see, there are no children included. After some reading, I understand what is going on. The CTE works on a snapshot created right before when the query was started. In format(parent)
, we are doing select format(child) from child where parent_identifier = $1.identifier)
, but that does not yield any child rows, because the child rows are not in the snapshot. So my question is not about this, as I understand this.
Of course, I could easily fix this if I simple do the json_build_object
stuff, exactly the same as in the format
functions, in the main query, but then I am duplicating code. I am using these format
functions also in other queries, unrelated to this question. Ideally, I want to avoid code duplication in my solution. So I would like to keep using them and probably need to refactor them first so they can be used in the scenario in this question.
I'm quite stuck now. I really would like to continue with the CTE (so I can avoid having to set the transaction isolation level to repeatable read
), but I cannot find a way to re-factor the format(parent)
and format(child)
functions and/or the CTE so I don't end up with code duplicates all over the place. Is there a smart soul on SO with some clever ideas?
Note that I am using PostgreSQL 10.1. Please find a fiddle here: http://sqlfiddle.com/#!17/a251d/2
update regarding Laurenz Albe's answer
Context: https://stackoverflow.com/revisions/48152380/1
In the question above, I was indeed simplifying my situation. Let me explain the real scenario more closely, without going into too much details which will be confusing.
In the scenario, the user is providing data (= parents
and their corresponding children
) for a certain date range, for example the month January 2018. Also, I am not just doing inserts, I am actually doing upserts and deletes of orphaned rows. So, the scenario is simple: the client is replacing all data for a given date range.
If I then do select format(parent) from parent where <parent is in date range as provided>
after the upserts and deletes, some other client might have altered an overlapping date range in between. In that case, I am returning different results as provided by the client, which might introduce bugs if the clients are not properly implemented. Hence, that's why I believe the inserts and select need be part of the same transaction with transaction isolation level set to repeatable read
.
But then, I started thinking about one single, fat CTE, hence my question.
I hope this clarifies the scenario.
Upvotes: 3
Views: 228
Reputation: 247455
As you noticed correctly, you cannot see rows modified in the CTE in the main SELECT
. This is documented:
The sub-statements in
WITH
are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements inWITH
, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. This alleviates the effects of the unpredictability of the actual order of row updates, and means thatRETURNING
data is the only way to communicate changes between differentWITH
sub-statements and the main query.
So you should use RETURNING
.
I guess the simplest way would be not to use a function, but to perform json_build_object
in the main query and have it operate on the CTEs parents
and children
.
Upvotes: 1