Finding out the hash value of a group of rows

Question

I have a table person in my PostgresSQL database, which contains data of different users.

I need to write a test case, which ensures that some routine does modify the data of user 1, and does not modify data of user 2.

For this purpose, I need to

a) calculate a hash code of all rows of user 1 and those of user 2,

b) then perform the operation under test,

c) calculate the hash code again and

d) compare hash codes from steps a) and c).

I found a way to calculate the hash code for a single row:

SELECT md5(CAST((f.*)AS text))
FROM person f;

In order to achieve my goal (find out whether rows of user 2 have been changed), I need to perform a query like this:

SELECT user_id, SOME_AGGREGATE_FUNCTION(md5(CAST((f.*)AS text)))
FROM person f
GROUP BY user_id;

What aggregate function can I use in order to calculate the hash code of a set of rows?

Note: I just want to know whether any rows of user 2 have been changed. I do not want to know, what exactly has changed.

Ihor Romanchenko · Accepted Answer

The simplest way - just concat all the string form md5 with string_agg. But to use this aggregate correctly you must specify ORDER BY.

Or use md5(string_agg(md5(CAST((f.*)AS text)),'')) with some ORDER BY - it will change if any field of f.* changes and it is cheap to compare.

Answers (2)