Benchmark: bigint vs int on PostgreSQL

Question

I want to increase my database performance. In a project, all tables went from int to bigint, which I think is a bad choice not only regarding storage, since int requires 4 bytes, and bigint requires 8 bytes;but also regarding performance. So I created a small table with 10 millions entries, with a script in Python:

import uuid

rows=10000000

output='insert_description_bigint.sql'
f = open(output, 'w')

set_schema="SET search_path = norma;
"
f.write(set_schema)

for i in range(1,rows):
    random_string=uuid.uuid4()
    query="insert into description_bigint (description_id, description) values (%d, '%s'); 
"
    f.write(query % (i,random_string))

And this is how I created my two tables:

-- BIGINT

DROP TABLE IF EXISTS description_bigint;

CREATE TABLE description_bigint
(
  description_id BIGINT PRIMARY KEY NOT NULL,
  description VARCHAR(200),
  constraint description_id_positive CHECK (description_id >= 0)
);

select count(1) from description_bigint;
select * from description_bigint;
select * from description_bigint where description_id = 9999999;

-- INT

DROP TABLE IF EXISTS description_int;

CREATE TABLE description_int
(
  description_id INT PRIMARY KEY NOT NULL,
  description VARCHAR(200),
  constraint description_id_positive CHECK (description_id >= 0)
);

After inserting all this data, I do a query for both tables, to measure the difference between them. And for my surprise they both have the same performance:

select * from description_bigint; -- 11m55s
select * from description_int; -- 11m55s

Am I doing something wrong with my benchmark ? Shouldn't int be faster than bigint ? Especially, when the primary key is by definition an index which means, to create an index for bigint would be slower than create an index for int, with the same amount of data, right ?

I know that is not just a small thing that will make a huge impact regarding performance on my database, but I want to ensure that we are using the best practices and focused into performance here.

klin · Accepted Answer

In a 64-bit system the two tables are nearly identical. The column description_id in description_int covers 8 bytes (4 for integer and 4 as alignment). Try this test:

select 
    pg_relation_size('description_int')/10000000 as table_int, 
    pg_relation_size('description_bigint')/10000000 as table_bigint,
    pg_relation_size('description_int_pkey')/10000000 as index_int,
    pg_relation_size('description_bigint_pkey')/10000000 as index_bigint;

The average row size of both tables is virtually the same. This is because the integer column occupies 8 bytes (4 bytes for a value and 4 bytes of alignment) exactly like bigint (8 bytes for a value without a filler). The same applies to index entries. This is a special case, however. If we add one more integer column to the first table:

CREATE TABLE two_integers
(
  description_id INT PRIMARY KEY NOT NULL,
  one_more_int INT,
  description VARCHAR(200),
  constraint description_id_positive CHECK (description_id >= 0)
);

the average row size should remain the same because the first 8 bytes will be used for two integers (without filler).

Find more details in Calculating and saving space in PostgreSQL.

Benchmark: bigint vs int on PostgreSQL

Answers (1)

Related Questions