Reputation: 27286
I am using a PostgreSQL
database to log various aspects of a long-running system I have and one particular item I am logging is the CLASSPATH
of the various sub-ordinate processes the system spawns .
Each of these CLASSPATH
s can be a good one thousand characters, maybe more, and the system can fire up many thousand subordinate processes every day. So having a VARCHAR
column to store all these CLASSPATH
s can be a waste of space unless PostgreSQL somehow interns those strings (given that most of the time they will be the same).
Is there a way to deal with this at the DB level transparently to the application, without having to implement such interning (string reuse) logic at the application layer?
And, if so, is the interning support clever enough to recognize common parts (e.g. you can have millions of long VARCHAR
or TEXT
values that are not equal, yet only differ in a few lines)?
Upvotes: 3
Views: 1070
Reputation: 30597
Postgresql does not internally intern strings.
You can break your current single table into two tables, one containing unique values for the CLASSPATH and the other containing the other columns. The second table uses a foreign key reference to refer to the values in the first. Something like this:
create table paths(
id serial primary key,
classpath text unique
);
create table log(
id serial primary key,
somedata text,
classpath_id int references test1(id)
);
Next you create a view which allows the application to see the data:
create view log_view(id, somedata, classpath) as
select log.id, log.somedata, paths.classpath
from log
left join paths on log.classpath_id=paths.id;
Now your application can query in the same way as it queried your original data.
Updating the tables is slightly more complex. Postgresql (9.3+) does support updatable views, but they are only automatically updatable if there is a single table in the FROM clause - and we have two. In this case you would have to write triggers or rules which replace updates on the view with corresponding updates to the tables.
There are simple examples of this in the postgresql manual. I found a more complete example here.
Upvotes: 4