Tomas Greif
Tomas Greif

Reputation: 22661

How to approach data warehouse (PostgreSQL) documentation?

We do have a small data warehouse in PostgreSQL database and I have to document all the tables.

I thought I can add a comment to every column and table and use pipe "|" separator to add more attributes. Then I can use information schema and array function to get documentation and use any reporting software to create desired output.

select
    ordinal_position,
    column_name,
    data_type,
    character_maximum_length,
    numeric_precision,
    numeric_scale,
    is_nullable,
    column_default,
    (string_to_array(descr.description,'|'))[1] as cs_name,
    (string_to_array(descr.description,'|'))[2] as cs_description,
    (string_to_array(descr.description,'|'))[3] as en_name,
    (string_to_array(descr.description,'|'))[4] as en_description,
    (string_to_array(descr.description,'|'))[5] as other
from 
    information_schema.columns columns
    join pg_catalog.pg_class klass on (columns.table_name = klass.relname and klass.relkind = 'r')
    left join pg_catalog.pg_description descr on (descr.objoid = klass.oid and descr.objsubid = columns.ordinal_position)
where 
    columns.table_schema = 'data_warehouse'
order by 
    columns.ordinal_position;

It is a good idea or is there better approach?

Upvotes: 1

Views: 538

Answers (2)

Tomas Greif
Tomas Greif

Reputation: 22661

If anyone interested, here is what I've used for initial load for my small documentation project. Documentation is in two tables, one for describing tables and one for describing columns and constraints. I appreciate any feedback.

/* -- Initial Load - Tables */

drop table dw_description_table cascade;

create table dw_description_table  (
  table_description_key serial primary key,
  physical_full_name character varying,
  physical_schema_name character varying,
  physical_table_name  character varying,
  Table_Type  character varying,   -- Fact Dimension   ETL Transformation
  Logical_Name_CS character varying,
  Description_CS character varying,
  Logical_Name_EN character varying,
  Description_EN character varying,
  ToDo  character varying,
  Table_Load_Type character varying,   --Manually TruncateLoad  AddNewRows
  Known_Exclusions character varying,
  Table_Clover_Script character varying
);



insert into dw_description_table (physical_full_name, physical_schema_name, physical_table_name) (

select 
    table_schema || '.' || table_name as physical_full_name, 
    table_schema, 
    table_name 
from 
    information_schema.tables 
where 
    table_name like 'dw%' or table_name like 'etl%'
)


/* -- Initial Load - Columns */


CREATE TABLE dw_description_column (
  column_description_key serial,
  table_description_key bigint,
  physical_full_name text,
  physical_schema_name character varying,
  physical_table_name  character varying,
  physical_column_name  character varying,
  ordinal_position  character varying,
  column_default  character varying,
  is_nullable  character varying,
  data_type  character varying,
  logical_name_cs  character varying,
  description_cs  character varying,
  logical_name_en  character varying,
  description_en  character varying,
  derived_rule  character varying,
  todo  character varying,
  pk_name  character varying,
  fk_name  character varying,
  foreign_table_name  character varying,
  foreign_column_name  character varying,
  is_primary_key boolean,
  is_foreign_key boolean,
  CONSTRAINT dw_description_column_pkey PRIMARY KEY (column_description_key ),
  CONSTRAINT fk_dw_description_table_key FOREIGN KEY (table_description_key)
      REFERENCES dw_description_table (table_description_key) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION  
);




insert into dw_description_column  ( 
    table_description_key ,  
    physical_full_name ,  
    physical_schema_name ,  
    physical_table_name  ,  
    physical_column_name ,  
    ordinal_position  ,  
    column_default  ,
    is_nullable  ,
    data_type  ,  
    logical_name_cs  ,  
    description_cs  ,  
    logical_name_en  ,  
    description_en  ,  
    derived_rule  ,  
    todo  ,  
    pk_name  ,
    fk_name  ,
    foreign_table_name  ,  
    foreign_column_name  ,  
    is_primary_key ,  
    is_foreign_key ) 

(

with

dw_constraints as (
SELECT  
    tc.constraint_name, 
    tc.constraint_schema || '.' || tc.table_name || '.' || kcu.column_name as physical_full_name,  
    tc.constraint_schema,
    tc.table_name, 
    kcu.column_name, 
    ccu.table_name AS foreign_table_name, 
    ccu.column_name AS foreign_column_name,
    TC.constraint_type
FROM 
    information_schema.table_constraints AS tc  
    JOIN information_schema.key_column_usage AS kcu ON (tc.constraint_name = kcu.constraint_name and tc.table_name = kcu.table_name)
    JOIN information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name
WHERE 
    constraint_type in ('PRIMARY KEY','FOREIGN KEY')
    AND tc.constraint_schema = 'bizdata'
    and (tc.table_name like 'dw%' or tc.table_name like 'etl%')
group by
    tc.constraint_name, 
    tc.constraint_schema,
    tc.table_name, 
    kcu.column_name, 
    ccu.table_name ,
    ccu.column_name,
    TC.constraint_type

)

select 
    dwdt.table_description_key,
    col.table_schema || '.' || col.table_name || '.' || col.column_name as physical_full_name, 
    col.table_schema as physical_schema_name, 
    col.table_name as physical_table_name, 
    col.column_name as physical_column_name, 
    col.ordinal_position, 
    col.column_default, 
    col.is_nullable, 
    col.data_type,
    null as Logical_Name_CS ,
    null as Description_CS ,
    null as Logical_Name_EN,
    null as Description_EN ,
    null as Derived_Rule ,
    null as ToDo,
    dwc1.constraint_name pk_name,
    dwc2.constraint_name as fk_name,
    dwc2.foreign_table_name,
    dwc2.foreign_column_name,
    case when dwc1.constraint_name is not null then true else false end as is_primary_key,
    case when dwc2.constraint_name is not null then true else false end as foreign_key
from 
    information_schema.columns col
    join dw_description_table dwdt on (col.table_schema || '.' || col.table_name = dwdt.physical_full_name )
    left join dw_constraints dwc1 on ((col.table_schema || '.' || col.table_name || '.' || col.column_name) = dwc1.physical_full_name and dwc1.constraint_type = 'PRIMARY KEY')
    left join dw_constraints dwc2 on ((col.table_schema || '.' || col.table_name || '.' || col.column_name) = dwc2.physical_full_name and dwc2.constraint_type = 'FOREIGN KEY') 
where 
    col.table_name like 'dw%' or col.table_name like 'etl%'
)

Upvotes: 1

Wayne Conrad
Wayne Conrad

Reputation: 108199

Unless you must include descriptions of the system tables, I wouldn't try to shoehorn your descriptions into pg_catalog.pg_description. Make your own table. That way you get to keep the columns as columns, and not have to use clunky string functions.

Alternatively, consider adding specially formatted comments to your master schema file, along the lines of javadoc. Then write a tool to extract those comments and create a document. That way the comments stay close to the thing they're commenting, and you don't have to mess with the database at all to produce the report. For example:

--* Used for authentication.
create table users
(
  --* standard Rails-friendly primary key.  Also an example of
  --* a long comment placed before the item, rather than on the 
  --* the same line.
  id serial primary key,
  name text not null,     --* Real name (hopefully)
  login text not null,    --* Name used for authentication
  ...
);

Your documentation tool reads the file, looks for the --* comments, figures out what comments go with what things, and produces some kind of report, e.g.:

table users: Used for authentication
  id: standard Rails-friendly primary key.  Also an example of a
      long comment placed before the item, rather than on the same
      line.
  name: Real name
  login: Name used for authentication

You might note that with appropriate comments, the master schema file itself is a pretty good report in its own right, and that perhaps nothing else is needed.

Upvotes: 3

Related Questions