Reputation: 11
A bit of a theoretical questions.
Was just wondering is there a way to optimize data fields?
Say for a given field you only have 3 possible strings but those are for some reason very long (say 50 characters), declaring that field a character_varying(50) seems like a lot of disk space wasted since data would essentially fit on 2 bits.
I guess you can solve the problem by JOINing label tables but is there another more proper way or do databases are able to auto-optimize that kind of column by themselves?
Are common databases capable to handle that kind of optimization of their own? Is there a way to declare that kind of structure in the DB (something like R-language factor concept)? Does Postgresql Domain structure helps any bit with optimization?
SOME CONTEXT:
Before you think it's a silly question. I've been working with old legacy systems (early 90s) where everything was heavily encoded in order to save memory and performance (e.g. sex would be coded (1,2) instead of (male,female) and many less obvious encoding).
Now we're moving system to a more modern DB (postgresql), hopefully we'll be able to use readable "plain text" fields.
I'm not really concerned about actual performance. More of a theorethical question.
Upvotes: 1
Views: 321
Reputation: 324551
PostgreSQL's enums (enumerations) are for exactly this.
CREATE TYPE sex AS ENUM ('male', 'female', 'intersex', 'unspecified');
(Yes, I'm making a point with my example here. Application developers that still force binary sex selection need to be hit with a clue-stick, hard. Same with those who confuse "sex" (biological) and "gender" (sociological).)
The main limitations with enums are that they must contain a name
, not an arbitrary-length string, and that you cannot remove values, only append/insert them. In all standard PostgreSQL builds the NAMEDATALEN
is set at 63 bytes. So you don't get to use long strings:
regress=> CREATE TYPE long AS ENUM ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
ERROR: invalid enum label "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
DETAIL: Labels must be 63 characters or less.
Enums are encoded internally as int4
values:
regress=> SELECT pg_column_size( 'female'::sex );
pg_column_size
----------------
4
(1 row)
so it's actually more compact to store a "char"
:
select pg_column_size('m'::"char");
if you don't mind the loss of self-documenting readability and the inability to specify the sort order independently of the value. "char"
is a PostgreSQL extension for a 1-byte fixed-size character value, and must always be used quoted to distinguish it from the SQL-standard character
type that may be abbreviated as char
.
Upvotes: 1
Reputation: 1516
I think you're looking for the 'enum' data type which you have to specifically create, which will save the data as an integer but convert it to a string on SELECT
e.g.
CREATE TYPE my_specific_text_field AS ENUM
(
'string one with longish text',
'second string with fairly long text',
'third string'
);
CREATE TABLE test (
id serial not null primary key,
myenum my_specific_text_field
);
INSERT INTO test (myenum) VALUES ('string one with longish text');
That said, enums can be a little cumbersome if you are unfamiliar with them, exporting enums can be tricky and also I believe their length is capped at 63 bytes.
Upvotes: 1