Reputation: 16656
I want to increase my database performance. In a project, all tables went from int
to bigint
, which I think is a bad choice not only regarding storage, since int
requires 4 bytes
, and bigint
requires 8 bytes
;but also regarding performance.
So I created a small table with 10 millions entries, with a script in Python:
import uuid
rows=10000000
output='insert_description_bigint.sql'
f = open(output, 'w')
set_schema="SET search_path = norma;\n"
f.write(set_schema)
for i in range(1,rows):
random_string=uuid.uuid4()
query="insert into description_bigint (description_id, description) values (%d, '%s'); \n"
f.write(query % (i,random_string))
And this is how I created my two
tables:
-- BIGINT
DROP TABLE IF EXISTS description_bigint;
CREATE TABLE description_bigint
(
description_id BIGINT PRIMARY KEY NOT NULL,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
select count(1) from description_bigint;
select * from description_bigint;
select * from description_bigint where description_id = 9999999;
-- INT
DROP TABLE IF EXISTS description_int;
CREATE TABLE description_int
(
description_id INT PRIMARY KEY NOT NULL,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
After inserting all this data, I do a query for both tables, to measure the difference between them. And for my surprise they both have the same performance:
select * from description_bigint; -- 11m55s
select * from description_int; -- 11m55s
Am I doing something wrong with my benchmark ? Shouldn't int
be faster than bigint
? Especially, when the primary key
is by definition an index
which means, to create an index for bigint
would be slower than create an index for int
, with the same amount of data, right ?
I know that is not just a small thing that will make a huge impact regarding performance on my database, but I want to ensure that we are using the best practices and focused into performance here.
Upvotes: 18
Views: 17201
Reputation: 121594
In a 64-bit system the two tables are nearly identical. The column description_id
in description_int
covers 8 bytes (4 for integer and 4 as alignment). Try this test:
select
pg_relation_size('description_int')/10000000 as table_int,
pg_relation_size('description_bigint')/10000000 as table_bigint,
pg_relation_size('description_int_pkey')/10000000 as index_int,
pg_relation_size('description_bigint_pkey')/10000000 as index_bigint;
The average row size of both tables is virtually the same. This is because the integer column occupies 8 bytes (4 bytes for a value and 4 bytes of alignment) exactly like bigint (8 bytes for a value without a filler). The same applies to index entries. This is a special case, however. If we add one more integer column to the first table:
CREATE TABLE two_integers
(
description_id INT PRIMARY KEY NOT NULL,
one_more_int INT,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
the average row size should remain the same because the first 8 bytes will be used for two integers (without filler).
Find more details in Calculating and saving space in PostgreSQL.
Upvotes: 20