bosco1
bosco1

Reputation: 51

Titan graph database too slow with 100000+ vertices with indices how to optimize it?

Here is the indices code:

`
g = TitanFactory.build().set("storage.backend", "cassandra")
            .set("storage.hostname", "127.0.0.1").open();

    TitanManagement mgmt = g.getManagementSystem();

    PropertyKey db_local_name = mgmt.makePropertyKey("db_local_name")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_local_name", Vertex.class).addKey(db_local_name)
            .buildCompositeIndex();

    PropertyKey db_schema = mgmt.makePropertyKey("db_schema")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_schema", Vertex.class).addKey(db_schema)
            .buildCompositeIndex();

    PropertyKey db_column = mgmt.makePropertyKey("db_column")
            .dataType(String.class).make();
    mgmt.buildIndex("byDb_column", Vertex.class).addKey(db_column)
            .buildCompositeIndex();

    PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class)
            .make();
    mgmt.buildIndex("byType", Vertex.class).addKey(type)
            .buildCompositeIndex();

    PropertyKey value = mgmt.makePropertyKey("value")
            .dataType(Object.class).make();
    mgmt.buildIndex("byValue", Vertex.class).addKey(value)
            .buildCompositeIndex();

    PropertyKey index = mgmt.makePropertyKey("index")
            .dataType(Integer.class).make();
    mgmt.buildIndex("byIndex", Vertex.class).addKey(index)
            .buildCompositeIndex();

    mgmt.commit();`

Here is the search for vertices and then add vertex with 3 edges on 3GHz 2GB RAM pc. It does 830 vertices in 3 hours and I have 100,000 data its too slow. The code is below:

for (Object[] rowObj : list) {
            // TXN_ID
            Iterator<Vertex> iter = g.query()
                    .has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "txn_id")
                    .has("value", rowObj[0]).vertices().iterator();
            if (iter.hasNext()) {
                vertex1 = iter.next();
                logger.debug("vertex1=" + vertex1.getId() + ","
                        + vertex1.getProperty("db_local_name") + ","
                        + vertex1.getProperty("db_schema") + ","
                        + vertex1.getProperty("db_column") + ","
                        + vertex1.getProperty("type") + ","
                        + vertex1.getProperty("index") + ","
                        + vertex1.getProperty("value"));
            }
            // TXN_TYPE
            iter = g.query().has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "txn_type")
                    .has("value", rowObj[1]).vertices().iterator();
            if (iter.hasNext()) {
                vertex2 = iter.next();
                logger.debug("vertex2=" + vertex2.getId() + ","
                        + vertex2.getProperty("db_local_name") + ","
                        + vertex2.getProperty("db_schema") + ","
                        + vertex2.getProperty("db_column") + ","
                        + vertex2.getProperty("type") + ","
                        + vertex2.getProperty("index") + ","
                        + vertex2.getProperty("value"));
            }
            // WALLET_ID
            iter = g.query().has("db_local_name", "Report Name 1")
                    .has("db_schema", "MPS").has("db_column", "wallet_id")
                    .has("value", rowObj[2]).vertices().iterator();
            if (iter.hasNext()) {
                vertex3 = iter.next();
                logger.debug("vertex3=" + vertex3.getId() + ","
                        + vertex3.getProperty("db_local_name") + ","
                        + vertex3.getProperty("db_schema") + ","
                        + vertex3.getProperty("db_column") + ","
                        + vertex3.getProperty("type") + ","
                        + vertex3.getProperty("index") + ","
                        + vertex3.getProperty("value"));
            }

            vertex4 = g.addVertex(null);
            vertex4.setProperty("db_local_name", "Report Name 1");
            vertex4.setProperty("db_schema", "MPS");
            vertex4.setProperty("db_column", "amount");
            vertex4.setProperty("type", "indivisual_0");
            vertex4.setProperty("value", rowObj[3].toString());
            vertex4.setProperty("index", i);

            vertex1.addEdge("data", vertex4);
            logger.debug("vertex1 added");
            vertex2.addEdge("data", vertex4);
            logger.debug("vertex2 added");
            vertex3.addEdge("data", vertex4);
            logger.debug("vertex3 added");
            i++;
            g.commit();
        }

Is there anyway to optimize this code?

Upvotes: 1

Views: 1247

Answers (1)

stephen mallette
stephen mallette

Reputation: 46216

For completeness, this question was answered in the Aurelius Graphs mailing list:

https://groups.google.com/forum/#!topic/aureliusgraphs/XKT6aokRfFI

Basically:

  1. build/use a real composite index: mgmt.buildIndex("by_local_name_schema_value", Vertex.class).addKey(db_local_name).addKey(db_schema).addKey(value).buildComposite();
  2. don't call g.commit() after each loop cycle, instead do something like this: if (++1%10000 == 0) g.commit()
  3. turn on storage.batch-loading if not already doing so
  4. if all you can throw at cassandra is 2G of RAM consider using BerkleyDB. Cassandra prefers 4G of RAM minimum and would probably like "more"
  5. I don't know the nature of your data, but can you pre-sort it and use BatchGraph as described in the Powers of Ten - Part I blog post and in the wiki - Using BatchGraph would prevent you from having to maintain the transaction described in number 2 above.

Upvotes: 1

Related Questions