zcourts
zcourts

Reputation: 5043

Mixing Apache Ignite BinaryObject with SQL tables

I'm evaluating Apache Ignite and trying to understand how the various models map to each other. As far as I've found so far, regardless of the engine/API you use to access the data, the underlying storage is KV pairs (is this right?).

A few questions have come to mind right now:

  1. Does each SQL table map to a different cache i.e. create table a and create table b creates two caches, a and b?

I ask this because of the API, the current API allows you to create a single cache instance and run multiple create table queries against it. Initially I thought this meant a cache was similar to a DB construct from an RDBMS, but a comment in the example taken from ignite master branch (shown below) indicates otherwise.

  1. What are the practical limits to the number of caches you can create or can you simply continue to add new nodes to expand the number of caches in the grid?

  2. How does BinaryObject related to SQL tables...? In the examples I've looked in tree it appears you can create a binary object and then access via SQL as long as you provide a mapping with the QueryEntity API.

  3. Are there benefits to using only BinaryObjects vs tables? In my head so far it would seem create table should just map to binary objects in the underlying impl. with things like index, types and QueryEntity mapping being done for you automatically.

  4. What are the naming limitations between binary/table types (both table/cache names and column/field names)? I saw in an example that it's possible to use something like a.b as a field name on a binary object but it is unclear to me how you'd then access this via SQL as I believe a name like that would collide with existing semantics.

  5. Is there a diagram/summary of the various constructs available in Ignite and their relationships to each other? Seeing something like that would drastically bring all the things I've read so far together for me. Currently reading "High performance in-memory computing with Ignite", haven't finished but from the contents page and what's read so far I get the feeling it doesn't cover some of these.

Finally, with what is probably a muddled understanding so far, I tried tweaking one of the Java examples to combine most of what I've asked questions about but have so far failed to get it to work.

import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.Ignition;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.QueryEntity;
import org.apache.ignite.cache.QueryIndex;
import org.apache.ignite.cache.QueryIndexType;
import org.apache.ignite.cache.query.SqlFieldsQuery;
import org.apache.ignite.configuration.CacheConfiguration;

import java.util.Arrays;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;

public class TableAndBinaryObjectCacheExperiment {
  private static final String cacheName = "some-cache-name";

  @SuppressWarnings({"unused", "ThrowFromFinallyBlock"})
  public static void main(String[] args) throws Exception {
    try (Ignite ignite = Ignition.start("ignite/ignite.xml")) {
      if (!ignite.cluster().active()) ignite.cluster().active(true);

      // Create dummy cache to act as an entry point for SQL queries (new SQL API which do not require this
      // will appear in future versions, JDBC and ODBC drivers do not require it already).
      CacheConfiguration<?, ?> cacheCfg = new CacheConfiguration<>(cacheName).setSqlSchema("PUBLIC");
      //
      LinkedHashMap<String, String> fields = new LinkedHashMap<>();
      fields.put("person_id", Long.class.getName());
      fields.put("name", String.class.getName());
      fields.put("address.postcode", String.class.getName());
      fields.put("age", Integer.class.getName());
      fields.put("about", String.class.getName());
      fields.put("misc", String.class.getName());

      QueryEntity testBinType = new QueryEntity();
      testBinType.setKeyType(String.class.getName());
      testBinType.setValueType("TestType");

      //primary key
      testBinType.setKeyType(Long.class.getName());
      testBinType.setKeyFieldName("test_id");

      testBinType.setFields(fields);
      testBinType.setTableName("test_type");
      testBinType.setIndexes(Arrays.asList(
        new QueryIndex("name"),
        new QueryIndex("address.postcode"),
        new QueryIndex("age"),
        new QueryIndex("about", QueryIndexType.FULLTEXT),
        new QueryIndex("person_id")
      ));

      CacheConfiguration<?, ?> binaryConf1 = new CacheConfiguration<>(cacheName);
      binaryConf1.setCacheMode(CacheMode.PARTITIONED);
      binaryConf1.setQueryEntities(Collections.singletonList(testBinType));
      //
      try (
        IgniteCache<?, ?> cache = ignite.getOrCreateCache(cacheCfg);
        IgniteCache<?, ?> binCacheX = ignite.getOrCreateCache(binaryConf1)
      ) {

        IgniteCache<?, ?> binCache = cache.withKeepBinary();
        // Create reference City table based on REPLICATED template.
        cache.query(new SqlFieldsQuery("CREATE TABLE IF NOT EXISTS city (id LONG PRIMARY KEY, name VARCHAR) WITH \"template=replicated\"")).getAll();
        // Create table based on PARTITIONED template with one backup.
        cache.query(new SqlFieldsQuery("CREATE TABLE IF NOT EXISTS person (id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id)) WITH \"backups=1, affinity_key=city_id\"")).getAll();
        // Create an index.
        cache.query(new SqlFieldsQuery("CREATE INDEX IF NOT EXISTS on Person (city_id)")).getAll();

        print("Created database objects.");

        SqlFieldsQuery qry = new SqlFieldsQuery("INSERT INTO city (id, name) VALUES (?, ?)");

        cache.query(qry.setArgs(1L, "Forest Hill")).getAll();
        cache.query(qry.setArgs(2L, "Denver")).getAll();
        cache.query(qry.setArgs(3L, "St. Petersburg")).getAll();

        qry = new SqlFieldsQuery("INSERT INTO person (id, name, city_id) values (?, ?, ?)");

        cache.query(qry.setArgs(1L, "John Doe", 3L)).getAll();
        cache.query(qry.setArgs(2L, "Jane Roe", 2L)).getAll();
        cache.query(qry.setArgs(3L, "Mary Major", 1L)).getAll();
        cache.query(qry.setArgs(4L, "Richard Miles", 2L)).getAll();

        qry = new SqlFieldsQuery("INSERT INTO test_type (test_id, name, age, about, \"address.postcode\") values (? ?, ?, ?, ?)");
        cache.query(qry.setArgs(1L, "Courtney", 12, "this is about me", "AB12CD", 3L));

        SqlFieldsQuery joinQuery = new SqlFieldsQuery(
          "SELECT p.name, c.name, t.about, \"t.address.postcode\" " +
            "FROM Person p " +
            "INNER JOIN City c on c.id = p.city_id " +
            "INNER JOIN test_type t on p.id = t.person_id " +
            "LIMIT 50");
        List<List<?>> res = cache.query(joinQuery).getAll();
        for (Object next : res)
          System.out.println(">>>    " + next);
      } finally {
        // Distributed cache can be removed from cluster only by #destroyCache() call.
        ignite.destroyCache(cacheName);
      }

      print("Cache query DDL example finished.");
    }
  }

  /**
   * Prints message.
   *
   * @param msg Message to print before all objects are printed.
   */
  private static void print(String msg) {
    System.out.println();
    System.out.println(">>> " + msg);
  }
}

I have read around the docs but have either missed these pieces of information or they're not immediately obvious or present.

Upvotes: 2

Views: 2732

Answers (3)

dmagda
dmagda

Reputation: 1785

BinaryObject is a data serialization and data storage format of Ignite. You might think of it as of JSON which is optimized and designed for Ignite specificities.

The benefit of the format is that you can run computations and other operations on the server nodes side avoiding data deserialization to its class form.

Talking about SQL, all the data you add to the cluster via INSERTS, UPDATES, etc. will be stored in a BinaryObject form as well.

If you'd like to mix SQL, key-value and compute APIs, then you can create a table/cache with CREATE TABLE command overriding the cache name to the name you like with CACHE_NAME parameter. Later, use the cache name for your key-value, compute grid and other operations. Refer to this project that shows how the APIs can be mingled.

Upvotes: 1

alamar
alamar

Reputation: 19313

import org.apache.ignite.Ignite;
import org.apache.ignite.IgniteCache;
import org.apache.ignite.Ignition;
import org.apache.ignite.cache.CacheMode;
import org.apache.ignite.cache.QueryEntity;
import org.apache.ignite.cache.QueryIndex;
import org.apache.ignite.cache.QueryIndexType;
import org.apache.ignite.cache.query.SqlFieldsQuery;
import org.apache.ignite.configuration.CacheConfiguration;

import java.util.Arrays;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;

public class TableAndBinaryObjectCacheExperiment {
  private static final String cacheName = "some-cache-name";

  @SuppressWarnings({"unused", "ThrowFromFinallyBlock"})
  public static void main(String[] args) throws Exception {
    try (Ignite ignite = Ignition.start("ignite/ignite.xml")) {
      if (!ignite.cluster().active()) ignite.cluster().active(true);

      // Create dummy cache to act as an entry point for SQL queries (new SQL API which do not require this
      // will appear in future versions, JDBC and ODBC drivers do not require it already).
      CacheConfiguration<?, ?> cacheCfg = new CacheConfiguration<>("default").setSqlSchema("PUBLIC");
      //
      LinkedHashMap<String, String> fields = new LinkedHashMap<>();
      fields.put("person_id", Long.class.getName());
      fields.put("name", String.class.getName());
      fields.put("address_postcode", String.class.getName());
      fields.put("age", Integer.class.getName());
      fields.put("about", String.class.getName());
      fields.put("misc", String.class.getName());

      QueryEntity testBinType = new QueryEntity();
      testBinType.setValueType("TestType");

      //primary key
      testBinType.setKeyType(Long.class.getName());
      testBinType.setKeyFieldName("person_id");

      testBinType.setFields(fields);
      testBinType.setTableName("test_type");
      testBinType.setIndexes(Arrays.asList(
        new QueryIndex("name"),
        new QueryIndex("address_postcode"),
        new QueryIndex("age"),
        new QueryIndex("about", QueryIndexType.FULLTEXT)
      ));

      CacheConfiguration<?, ?> binaryConf1 = new CacheConfiguration<>(cacheName);
      binaryConf1.setCacheMode(CacheMode.PARTITIONED);
      binaryConf1.setQueryEntities(Collections.singletonList(testBinType));
      //
      try (
        IgniteCache<?, ?> cache = ignite.getOrCreateCache(cacheCfg);
        IgniteCache<?, ?> binCacheX = ignite.getOrCreateCache(binaryConf1)
      ) {

        IgniteCache<?, ?> binCache = cache.withKeepBinary();
        // Create reference City table based on REPLICATED template.
        cache.query(new SqlFieldsQuery("CREATE TABLE IF NOT EXISTS city (id LONG PRIMARY KEY, name VARCHAR) WITH \"template=replicated\"")).getAll();
        // Create table based on PARTITIONED template with one backup.
        cache.query(new SqlFieldsQuery("CREATE TABLE IF NOT EXISTS person (id LONG, name VARCHAR, city_id LONG, PRIMARY KEY (id, city_id)) WITH \"backups=1, affinity_key=city_id\"")).getAll();
        // Create an index.
        cache.query(new SqlFieldsQuery("CREATE INDEX IF NOT EXISTS on Person (city_id)")).getAll();

        print("Created database objects.");

        SqlFieldsQuery qry = new SqlFieldsQuery("INSERT INTO city (id, name) VALUES (?, ?)");

        cache.query(qry.setArgs(1L, "Forest Hill")).getAll();
        cache.query(qry.setArgs(2L, "Denver")).getAll();
        cache.query(qry.setArgs(3L, "St. Petersburg")).getAll();

        qry = new SqlFieldsQuery("INSERT INTO person (id, name, city_id) values (?, ?, ?)");

        cache.query(qry.setArgs(1L, "John Doe", 3L)).getAll();
        cache.query(qry.setArgs(2L, "Jane Roe", 2L)).getAll();
        cache.query(qry.setArgs(3L, "Mary Major", 1L)).getAll();
        cache.query(qry.setArgs(4L, "Richard Miles", 2L)).getAll();

        qry = new SqlFieldsQuery("INSERT INTO \"some-cache-name\".test_type (person_id, name, age, about, address_postcode) values (?, ?, ?, ?, ?)");
        cache.query(qry.setArgs(1L, "Courtney", 12, "this is about me", "AB12CD", 3L));

        SqlFieldsQuery joinQuery = new SqlFieldsQuery(
          "SELECT p.name, c.name, t.about, t.address_postcode " +
            "FROM Person p " +
            "INNER JOIN City c on c.id = p.city_id " +
            "INNER JOIN \"some-cache-name\".test_type t on p.id = t.person_id " +
            "LIMIT 50");
        List<List<?>> res = cache.query(joinQuery).getAll();
        for (Object next : res)
          System.out.println(">>>    " + next);
      } finally {
        // Distributed cache can be removed from cluster only by #destroyCache() call.
        ignite.destroyCache(cacheName);
      }

      print("Cache query DDL example finished.");
    }
  }

  /**
   * Prints message.
   *
   * @param msg Message to print before all objects are printed.
   */
  private static void print(String msg) {
    System.out.println();
    System.out.println(">>> " + msg);
  }
}
  • Rename default cache to "default" (you tried to have two caches by "some-cache-name", only one was getting created, without SQL table)
  • Remove extra setKeyType(), rename test_id to person_id, remove index on it since it's implicit.
  • Replace dot with underscore in person.address.
  • Add schema to INSERT and SELECT, add missing comma.

Much better now:

>>> Created database objects.
>>>    [John Doe, St. Petersburg, this is about me, AB12CD]

>>> Cache query DDL example finished.

Upvotes: 1

alamar
alamar

Reputation: 19313

  1. Creating two tables will create two caches, named SQL_PUBLIC_A and SQL_PUBLIC_B by default. One cache - one table. SQL queries currently need a cache to be invoked on in Native API, that's a legacy restriction and it will be lifted in the future. Cache that you run a invoke a query on doesn't matter. You can have some default cache, run all CREATEs and SELECTs on it.
  2. All nodes keep accounting on all caches so it doesn't scale with number of nodes. Practically, create cache is a heavy operation and caches take around 20M RAM per node, each. Avoid creating too many caches. CacheGroups let different caches with same structure share most infrastructure, save on overhead.
  3. Any key or value in cache (that isn't primitive) can be represented by BinaryObject. This means you can always access rows as BinaryObjects. You can also access BinaryObjects as rows if you provided a mapping at the time of creating the cache.
  4. You can use Cache API with BinaryObjects, including e.g. IgniteDataStreamer. Ignite has a huge number of APIs, most of those are accessible with BinaryObjects but only SQL querying is accessible with tables. So BinaryObject is your marshalling tool.
  5. Your mileage may vary here. My recommendation is to use alphanumerics. Note that BinaryObject is a superset of all features, so only a subset of BinaryObjects may make sense to SQL engine.
  6. This image? https://ignite.apache.org/images/data_grid.png

Now, fixing the code sample warrants for a separate response.

Upvotes: 2

Related Questions