Reputation: 13930
As in my last question (see details there), I am using
SELECT encode(digest(x::text::bytea, 'sha1'), 'hex') FROM xtmp;
Not solved, is not the same hash tham original... Perhaps the ::text
enforced internal representation with \n
symbols, so a solution will be direct cast to bytea
, but it is an invalid cast.
The other workaround also not is a solution,
SELECT encode(digest( replace(x::text,'\n',E'\n')::bytea, 'sha1' ), 'hex')
FROM xtmp
... I try CREATE TABLE btmp (x bytea)
and COPY btmp FROM '/tmp/test.xml' ( FORMAT binary )
, but error ("unknown COPY file signature").
Upvotes: 6
Views: 12887
Reputation: 13930
Simple solution! Add a "\n".
SELECT encode(digest((x::text||E'\n')::bytea, 'sha1'), 'hex') FROM xtmp;
But the real problem is to get original file without cut the last "\n" (last EOL)... Lets see the function at my old test-kit:
INSERT INTO xtmp (x)
SELECT array_to_string(array_agg(x),E'\n')::xml FROM ttmp
;
It is here the "bug" (after workaround the COPY
that not load a full-file into one row one field).
The array_to_string()
not adds the last EOL, so concatenating by || E'\n'
fixed the bug.
Checking other hypothesis and offering a good solution to the test-kit.
The end-of-line (EOL) is an obligation at POSIX filesystems (and non-binary mode), see this answer about EOL. We can imagine something like "string and file representations differ by an EOL"... We can check? it differs?
We can demonstrate by terminal that there are no "string vs file" problem, no strange addiction of EOL:
printf "<root/>" > original1.xml
printf "<root/>\n" > original2.xml
sha1sum original*.xml
printf "<root/>" | openssl sha1
printf "<root/>\n" | openssl sha1
results
062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d original1.xml
a05d91cbf0902b0fe341c979e9fc18fc69813f55 original2.xml
(stdin)= 062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
(stdin)= a05d91cbf0902b0fe341c979e9fc18fc69813f55
So sha1sum
is not using an extra EOL, the string and the file are the some.
Now at SQL, the same conclusions:
SELECT encode(digest('<root/>'::bytea, 'sha1'), 'hex') ;
SELECT encode(digest(E'<root/>\n'::bytea, 'sha1'), 'hex') ;
results
062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
a05d91cbf0902b0fe341c979e9fc18fc69813f55
The COPY
command is ugly for this simple load/save text procedure, lets use a direct getfile function instead:
CREATE FUNCTION getfile(p_file text) RETURNS text AS $$
with open(args[0],"r") as content_file:
content = content_file.read()
return content
$$ LANGUAGE PLpythonU;
SELECT encode(digest( getfile('/tmp/original1.xml') ::bytea, 'sha1'), 'hex') ;
SELECT encode(digest( getfile('/tmp/original2.xml') ::bytea, 'sha1'), 'hex') ;
results
062c3db8ce3458fc3ccaf2f930bf663d8ce31d7d
a05d91cbf0902b0fe341c979e9fc18fc69813f55
Perfect (!), no EOL problem now.
Upvotes: 5