Kishor Bachhav
Kishor Bachhav

Reputation: 181

Memsql TPCH queries

I am trying TPCH DDL queries on memsql. I am new to memsql. I am not able to convert 5 TPCH ddl sql queries to memsql queries. Not able to achieve foreign key relationship using memsql's FOREIGH SHARD KEY concept. Please help me to covert below 5 out of 8 table creation queries into memsql queries. Tried hard but face lot of different issues.

CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL PRIMARY KEY,
                   R_NAME       CHAR(25) NOT NULL,
                   R_COMMENT    VARCHAR(152)
                   );

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL PRIMARY KEY,
                   N_NAME       CHAR(25) NOT NULL,
                   N_REGIONKEY  INTEGER NOT NULL REFERENCES REGION(R_REGIONKEY),
                   N_COMMENT    VARCHAR(152)
                   );

CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL PRIMARY KEY,
                 P_NAME        VARCHAR(55) NOT NULL,
                 P_MFGR        CHAR(25) NOT NULL,
                 P_BRAND       CHAR(10) NOT NULL,
                 P_TYPE        VARCHAR(25) NOT NULL,
                 P_SIZE        INTEGER NOT NULL,
                 P_CONTAINER   CHAR(10) NOT NULL,
                 P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                 P_COMMENT     VARCHAR(23) NOT NULL
                 );

CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL PRIMARY KEY,
                    S_NAME        CHAR(25) NOT NULL,
                    S_ADDRESS     VARCHAR(40) NOT NULL,
                    S_NATIONKEY   INTEGER NOT NULL REFERENCES NATION(N_NATIONKEY),
                    S_PHONE       CHAR(15) NOT NULL,
                    S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                    S_COMMENT     VARCHAR(101) NOT NULL
                    );

CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL REFERENCES PART(P_PARTKEY),
                    PS_SUPPKEY     INTEGER NOT NULL REFERENCES SUPPLIER(S_SUPPKEY),
                    PS_AVAILQTY    INTEGER NOT NULL,
                    PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                    PS_COMMENT     VARCHAR(199) NOT NULL,
                    PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)
                    );

CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL PRIMARY KEY,
                    C_NAME        VARCHAR(25) NOT NULL,
                    C_ADDRESS     VARCHAR(40) NOT NULL,
                    C_NATIONKEY   INTEGER NOT NULL REFERENCES NATION(N_NATIONKEY),
                    C_PHONE       CHAR(15) NOT NULL,
                    C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                    C_MKTSEGMENT  CHAR(10) NOT NULL,
                    C_COMMENT     VARCHAR(117) NOT NULL
                    );

CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL PRIMARY KEY,
                   O_CUSTKEY        INTEGER NOT NULL REFERENCES CUSTOMER(C_CUSTKEY),
                   O_ORDERSTATUS    CHAR(1) NOT NULL,
                   O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                   O_ORDERDATE      DATE NOT NULL,
                   O_ORDERPRIORITY  CHAR(15) NOT NULL,
                   O_CLERK          CHAR(15) NOT NULL,
                   O_SHIPPRIORITY   INTEGER NOT NULL,
                   O_COMMENT        VARCHAR(79) NOT NULL
                   );

CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL REFERENCES ORDERS(O_ORDERKEY),
                    L_PARTKEY     INTEGER NOT NULL REFERENCES PART(P_PARTKEY),
                    L_SUPPKEY     INTEGER NOT NULL REFERENCES SUPPLIER(S_SUPPKEY),
                    L_LINENUMBER  INTEGER NOT NULL,
                    L_QUANTITY    DECIMAL(15,2) NOT NULL,
                    L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                    L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                    L_TAX         DECIMAL(15,2) NOT NULL,
                    L_RETURNFLAG  CHAR(1) NOT NULL,
                    L_LINESTATUS  CHAR(1) NOT NULL,
                    L_SHIPDATE    DATE NOT NULL,
                    L_COMMITDATE  DATE NOT NULL,
                    L_RECEIPTDATE DATE NOT NULL,
                    L_SHIPINSTRUCT CHAR(25) NOT NULL,
                    L_SHIPMODE     CHAR(10) NOT NULL,
                    L_COMMENT      VARCHAR(44) NOT NULL,
                    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER),
                    FOREIGN KEY (L_PARTKEY,L_SUPPKEY) REFERENCES PARTSUPP(PS_PARTKEY, PS_SUPPKEY)
                    );                                                     

I am able to create first 3 tables in memsql but not able to remaining tables.1st and 3rd queries are very simple and worked as it is. I am able to create 2nd table but again not sure whether this is right way to achieve.

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                   N_NAME       CHAR(25) NOT NULL,
                   N_REGIONKEY  INTEGER NOT NULL,
                   N_COMMENT    VARCHAR(152),
                   FOREIGN SHARD KEY (N_REGIONKEY) REFERENCES REGION (R_REGIONKEY), 
           PRIMARY KEY (N_NATIONKEY, N_REGIONKEY)
                   );

Is it possible to create only Replicate table and not partition in memsql? and how?

Upvotes: 1

Views: 986

Answers (1)

Rob Walzer
Rob Walzer

Reputation: 349

Since MemSQL does not support referential integrity, foreign shard keys are an optimization aid and not necessary. Foreign shard keys though do allow you to know at table creation time that two tables may be joined locally (no network traffic) on that key. However, the optimizer does not require foreign shard keys to take advantage of this data locality.

Starting with the ORDERS and LINEITEM tables:

CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL PRIMARY KEY,
               O_CUSTKEY        INTEGER NOT NULL,
               O_ORDERSTATUS    CHAR(1) NOT NULL,
               O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
               O_ORDERDATE      DATE NOT NULL,
               O_ORDERPRIORITY  CHAR(15) NOT NULL,
               O_CLERK          CHAR(15) NOT NULL,
               O_SHIPPRIORITY   INTEGER NOT NULL,
               O_COMMENT        VARCHAR(79) NOT NULL,
               KEY (O_CUSTKEY)
               );

CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                L_PARTKEY     INTEGER NOT NULL,
                L_SUPPKEY     INTEGER NOT NULL,
                L_LINENUMBER  INTEGER NOT NULL,
                L_QUANTITY    DECIMAL(15,2) NOT NULL,
                L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                L_TAX         DECIMAL(15,2) NOT NULL,
                L_RETURNFLAG  CHAR(1) NOT NULL,
                L_LINESTATUS  CHAR(1) NOT NULL,
                L_SHIPDATE    DATE NOT NULL,
                L_COMMITDATE  DATE NOT NULL,
                L_RECEIPTDATE DATE NOT NULL,
                L_SHIPINSTRUCT CHAR(25) NOT NULL,
                L_SHIPMODE     CHAR(10) NOT NULL,
                L_COMMENT      VARCHAR(44) NOT NULL,
                PRIMARY KEY (L_ORDERKEY,L_LINENUMBER),
                FOREIGN SHARD KEY (L_ORDERKEY) REFERENCES ORDERS (O_ORDERKEY),
                KEY (L_PARTKEY),
                KEY (L_SUPPKEY)
                );  

In this case we know we can take advantage of a local join between ORDERS and LINEITEM, because they are both sharded on ORDERKEY. ORDERS and LINEITEM are the two biggest tables in TPCH, so we want to ensure that they can be joined locally. Since ORDERS' primary key is O_ORDERKEY, I do not need to specify a shard key for ORDERS. MemSQL will shard by O_ORDERKEY automatically.

I've also put secondary indexes on the remaining foreign key columns. This is useful since there will be joins on the foreign keys.

Applying these concepts to PART, PARTSUPP, SUPPLIER, and CUSTOMER:

CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL PRIMARY KEY,
                C_NAME        VARCHAR(25) NOT NULL,
                C_ADDRESS     VARCHAR(40) NOT NULL,
                C_NATIONKEY   INTEGER NOT NULL,
                C_PHONE       CHAR(15) NOT NULL,
                C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                C_MKTSEGMENT  CHAR(10) NOT NULL,
                C_COMMENT     VARCHAR(117) NOT NULL,
                KEY(C_NATIONKEY)
                );


CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL PRIMARY KEY,
                S_NAME        CHAR(25) NOT NULL,
                S_ADDRESS     VARCHAR(40) NOT NULL,
                S_NATIONKEY   INTEGER NOT NULL,
                S_PHONE       CHAR(15) NOT NULL,
                S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                S_COMMENT     VARCHAR(101) NOT NULL,
                KEY(S_NATIONKEY)
                );

CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL PRIMARY KEY,
             P_NAME        VARCHAR(55) NOT NULL,
             P_MFGR        CHAR(25) NOT NULL,
             P_BRAND       CHAR(10) NOT NULL,
             P_TYPE        VARCHAR(25) NOT NULL,
             P_SIZE        INTEGER NOT NULL,
             P_CONTAINER   CHAR(10) NOT NULL,
             P_RETAILPRICE DECIMAL(15,2) NOT NULL,
             P_COMMENT     VARCHAR(23) NOT NULL
             );

CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                PS_SUPPKEY     INTEGER NOT NULL,
                PS_AVAILQTY    INTEGER NOT NULL,
                PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                PS_COMMENT     VARCHAR(199) NOT NULL,
                PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY),
                SHARD KEY(PS_PARTKEY),
                KEY(PS_SUPPKEY)
                );

Although PARTSUPP and PART are both sharded on the same key (PARTKEY), I do not need to specify a foreign shard key in order to take advantage of a local join between them on that key; the optimizer will pick that up automatically.

In response to your final question, MemSQL does allow you create a replicated table instead of a partitioned table. This is called a reference table and will be useful for the NATION and REGION tables since they are very small. Reference tables are not necessary to run distributed queries, but are a useful optimization.

CREATE REFERENCE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL PRIMARY KEY,
               R_NAME       CHAR(25) NOT NULL,
               R_COMMENT    VARCHAR(152)
               );

CREATE REFERENCE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL PRIMARY KEY,
               N_NAME       CHAR(25) NOT NULL,
               N_REGIONKEY  INTEGER NOT NULL,
               N_COMMENT    VARCHAR(152)
               );

For more documentation on everything described, check out: http://docs.memsql.com/latest/concepts/distributed_sql/

Upvotes: 4

Related Questions