Reputation: 1907

DB2 equivalency for Teradata clustered tables

In Teradata the definition of a clustered table is very simple and exists in a single syntax (correct me if I'm wrong).

CREATE TABLE table_name (charcol1 varchar(10), idcol integer)
primary index (idcol);

I am reading that there are a whole number of "clustered" table types in DB2 with respect to creating a share-nothing physical data layout for the object.

Defined primary index in TD will spread the table data across all nodes/virtual cpus available to the TD server.

What is the closest method (and syntax) to achieve the same within DB2?

Upvotes: 0

Answers (3)

Jeremy Rickard

Reputation: 3

Is this for a OLAP system or OLTP?

Shared nothing with DB2 is rarely used for OLTP. There are colocation considerations, and importantly the cost of a commit is a good deal higher (2 phase commit).

If this is an OLAP system then are you considering column-organised tables? You may not need to partition.

Upvotes: 0

Mark Barinstein

Reputation: 12314

Tables are created in tablespaces, tablespaces are created in database partition groups (or sets of nodes, which you can create as you want) in Db2 for LUW.
You may use the following query to understand which tablespaces reside in which database partition groups.

SELECT 
--  T.TBSPACEID, T.TBSPACETYPE
  CASE T.DATATYPE 
  WHEN 'A' THEN 'REGULAR'
  WHEN 'L' THEN 'LARGE'
  WHEN 'T' THEN 'SYSTEMP'
  WHEN 'U' THEN 'USRTEMP'
  END DATATYPE
, T.DBPGNAME
--, T.PAGESIZE
, T.TBSPACE
--, B.BPNAME
--, G.MIN_NODENUM, G.MAX_NODENUM, G.NODE_COUNT
, G.NODES_LIST
FROM SYSCAT.TABLESPACES T
JOIN SYSCAT.BUFFERPOOLS B ON B.BUFFERPOOLID=T.BUFFERPOOLID
LEFT JOIN (
SELECT 
  DBPGNAME
, COUNT(*) NODE_COUNT
, MIN(DBPARTITIONNUM) MIN_NODENUM
, MAX(DBPARTITIONNUM) MAX_NODENUM
, LISTAGG(DBPARTITIONNUM, ',') WITHIN GROUP (ORDER BY DBPARTITIONNUM) NODES_LIST 
FROM SYSCAT.DBPARTITIONGROUPDEF
GROUP BY DBPGNAME
) G ON G.DBPGNAME=T.DBPGNAME
ORDER BY T.TBSPACEID;

Let's say you get the following result:

DATATYPE  DBPGNAME        TBSPACE    NODES_LIST                                          
--------  --------------- ---------- ----------
...
LARGE     IBMDEFAULTGROUP USERSPACE1 0,1,2,3
...

This means that if you run the following statement, then the table is created in a tablespace USERSPACE1 which resides on database partitions (nodes) 0-3, and table data is distributed between these nodes based on a hash value computed on values of YEAR column.

CREATE TABLE SALES 
(
  CUSTOMER   VARCHAR(80)
, REGION     CHAR(5)
, YEAR       INTEGER
)
IN USERSPACE1
DISTRIBUTE BY HASH (YEAR);

It's always advisable to specify a tablespase explicitly, unless the rules of choosing such a tablespace which Db2 uses in case of absence of such a specification are applicable for your particular statement.

Upvotes: 1

data_henrik

Reputation: 17118

I would recommend to start with this overview of data partitioning (concepts) for Db2.

The following three clauses demonstrate the levels of data organization that can be used together in any combination:

DISTRIBUTE BY to spread data evenly across database partitions (to enable intraquery parallelism and to balance the load across each database partition) (database partitioning)

PARTITION BY to group rows with similar values of a single dimension in the same data partition (table partitioning)

ORGANIZE BY to group rows with similar values on multiple dimensions in the same table extent (multidimensional clustering) or to group rows according to the time of the insert operation (insert time clustering table).

All three clauses can be used within the same CREATE TABLE statement. Its usage depends on the data and the envisioned queries.

The database partitioning (DISTRIBUTE BY) is the shared-nothing you asked for. Distribute the data across partitions based on the YEAR.

 CREATE TABLE SALES
     (CUSTOMER   VARCHAR(80),
      REGION     CHAR(5),
      YEAR       INTEGER)
   DISTRIBUTE BY HASH (YEAR)

Upvotes: 2

DB2 equivalency for Teradata clustered tables

Answers (3)

Related Questions