Reputation: 3103
I'm exploring the datomic database, and in so doing having a go at taking some of its ideas and implementing them in sql in an incremental way so as to adjust to the new ways of data modelling. This question is really entirely about SQL though, I just mention that for background, to explain the why of what I'm doing here (though might be interesting for those interested in datomic too, which is why I also added the datomic tag to the question).
Generally we are getting rid of separate tables per type, but I will retain a users table for this example, rather than simply use an entities table (may try that later, but not yet).
create table users (
id uuid,
identity text -- e.g. 'the yankees', 'man born as john in birmingham on date x/y/z'
);
Then we have an EAVT store, also with an added
boolean to specify add or retract. This table is append only. We will never issue update or delete on it.
create table eavt_log (
user_id uuid,
attribute text,
value text,
added boolean,
created_at timestamp
);
Now some data to illustrate usage intended
-- insert person number 12345 (imagine as national identity or birth certificate no.)
insert into users(id, identity) values (uuid_generate_v4(), 'p-12345');
-- lets insert some facts about a person previously known as john smith:
insert into eavt_log(user_id, attribute, value, added, created_at) values
((select id from users where identity='p-12345'),
'name', 'John Smith', true, '1911-01-01'),
((select id from users where identity='p-12345'),
'name', 'John Smith', false, '1931-01-01'),
((select id from users where identity='p-12345'),
'name', 'John Bontine Smith', true, '1931-01-01');
To make this useful (any database must provide leverage, as Hickey says), lets try to find all the unretracted names for the person previously known as John Smith.
Here's my (bad) attempt
-- find all currently unretracted names for person previously known as John Smith. This could
-- be 0, 1 (we hope), or more - it just depends though, and should, on what data has been input.
(select attribute, value from eavt_log
where user_id = (select id from users where identity='p-12345')
and attribute = 'name'
and added = true
order by created_at desc) -- <- can sneak this in w/o upsetting the except, as it's not in the select.
except
(select attribute, value from eavt_log
where user_id = (select id from users where identity='p-12345')
and attribute = 'name'
and added = false);
That gives:
attribute | value
-----------+--------------------
name | John Bontine Smith
(1 row)
Which is correct for the test data we gave it.
Then we can try to generalise to
create view unretracted as (
(select user_id, attribute, value from eavt_log
where added = true
order by created_at)
except
(select user_id, attribute, value from eavt_log
where added = false)
);
Problem is, both of these are flawed, because this simple except will give incorrect result for the case when a fact has been added, retracted, then added again. i.e. if we add
((select id from users where identity='p-12345'),
'name', 'John Smith', false, '1941-01-01');
to the facts inserted above, to denote that person-12345, in 1941, adopted the name John Smith again (without retracting the name 'John Bontine Smith', so in this case we want the system to return two values for his name).
With this data, the earlier retract of this identical value will cause this later re-assertion of the same value to be excluded from the result set, even though its been reasserted, due to the way EXCEPT
is working (we did not do a linear table scan which i think may be required here?)
My question (finally!) -- is there a way to achieve this in SQL? Can SQL give us more leverage here?
It seems as if we need a where after the except which reaches back into the first select... but that seems impossible in set theory terms, so I wonder what else SQL can do here.
Upvotes: 0
Views: 117
Reputation: 6713
This is edited for your update, although I think there is still something wrong. You added an additional retracted row, which seems to contradict your text. Assuming that the row is actually added instead of retracted, we can use the below query.
You can use DISTINCT ON
in postgres to get the last value per user. If you use that in a sub-select, you can only select the rows for which added = true:
SELECT attribute, value
FROM (
SELECT distinct on (eavt_log.user_id, attribute, value)
attribute, value, added
FROM eavt_log
JOIN users ON eavt_log.user_id = users.id
WHERE attribute = 'name'
ORDER BY eavt_log.user_id, attribute, value, created_at desc) sub
WHERE added = 't';
Edit: here's a fiddle
Upvotes: 1