Madhankkumar S A
Madhankkumar S A

Reputation: 47

Different type of imaging techniques in kognitio

Can anyone please explain me what are all the imaging techniques available in kognitio.

Would be great if you cover the below techniques.

1) Replicated

2) REPLICATED PARTITION IMAGE BY (column_name)

3) HASHED ON (column_name) PARTITION IMAGE BY (column_name)

Thanks in advance.

Upvotes: 1

Views: 95

Answers (2)

Srini V
Srini V

Reputation: 11375

We have four different Imaging options in WX2 Random – even round robin distribution (default) Hashed – placed onto RAM stores according to key Partial Hashed – as hashed but handles skewing attributes Replicated – complete copy on each RAM Store

Replication puts a copy of the image on every RAM Store. It can be costly in terms of RAM and redistribution time. Good for small lookup/dimension tables It cannot be fragmented. It is required for Theta joins. Replication is per RAM Store and not per node.

Hashing distributes the rows of a table or view image across the RAM Stores. It is dependent upon the value of one or more columns. It is good for joining large tables – hash on common key. It may lead to skewing. The number of distinct values is less than number of RAM Stores. One or two values greatly exceed the others in frequency. Partial distribution may be used to neutralize value skew

Partial hashing is a mechanism to handle joins when a large table is severely skewed on key column(s). It is an alternative to straightforward hashing. Types are Partial hashed/random RAM stores and Partial hashed/replicated across RAM stores

Upvotes: 1

mc110
mc110

Reputation: 2833

The Kognitio community forum article here has links to all the latest documentation.

In particular, chapter 2 of the Kognitio Guide covers the various table and view image options that exist.

The ones referred to in the original question are:

  1. replicated - here a copy of the object is placed in every ram store process. This is typically used for dimension objects to allow them to be joined to large objects, regardless of whether those objects are randomly distributed or hashed.
  2. partitioned (deciding whether to partition or not is independent of whether you are replicating/randomising/hashing) - this allows the ram store to partition on an attribute. The main benefit is that partitions can be eliminated on scans, reducing the amount of data processed. Note the further comments in the documentation on partitioning though.
  3. hashed - hashing on an attribute allows data to be distributed according to that attribute value. For example, in a retail example you might hash the customer table by customer_id, and do the same with the transaction table, then any given transaction is located on the same ram store as the relevant customer record. Note that this distribution is prone to skewing; so consult the documentation for details on using partial distributions to defeat skew.

Upvotes: 2

Related Questions