anber
anber

Reputation: 874

Running OrientDB in distributed mode on AWS does not work

I have 3 OrientDB (2.2.7) nodes setup on AWS. They are running in distributed mode.

Whenever I connect to the server on port 2424, the connection locks up in pyorient.

I'm aware of some issues in regards to running OrientDB in distributed mode as per this question: Creating a database in Orientdb in distributed mode

In order to avoid any issues, I'm running permanent instances as suggested by the documentation.

I also configued the EC2 instances to be "c3.4xlarge" instances as suggested by the hazelcast EC2 whitepaper. (Amazon_EC2_Deployment_Guide_v0.3_web.pdf)

I had my hazelcast.xml configured to use tcp-ip and aws discovery strategies and both delivered the same results. The servers can be seen connecting to one another via hazelcast to the discovery is working fine.

I have the following policies attached to my user.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "Stm7747196888759",
        "Action": [
            "ec2:DescribeInstances"
        ],
        "Effect": "Allow",
        "Resource": "*"
    }
]
}

Each have hazelcast.xml configured like so:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.7.xsd"
       xmlns="http://www.hazelcast.com/schema/config"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <group>
            <name>orientdb</name>
            <password>xxxxxxxxx</password>
    </group>
    <properties>
            <property name="hazelcast.local.localAddress">{{LOCAL_IP}}</property>
            <property name="hazelcast.icmp.enabled">true</property>
    </properties>
    <network>
            <public-address>{{PUBLIC_IP}}</public-address>
            <port auto-increment="true">2434</port>
            <join>
                    <multicast enabled="false">
                            <multicast-group>235.1.1.1</multicast-group>
                            <multicast-port>2434</multicast-port>
                    </multicast>
                    <tcp-ip enabled="true">
                            <member>57.xx.xx.165</member>
                            <member>57.xx.xx.236</member>
                            <member>57.xx.xx.133</member>
                    </tcp-ip>
                    <aws enabled="false">
                            <access-key>xxxx</access-key>
                            <secret-key>xxxx</secret-key>
                            <host-header>ec2.amazonaws.com</host-header>
                            <region>eu-west-1</region>
                    </aws>
            </join>
            <interfaces enabled="false">
                    <interface>{{LOCAL_IP}}</interface>
            </interfaces>
    </network>
    <executor-service>
            <pool-size>16</pool-size>
    </executor-service>
</hazelcast>

As can be seen from my hazelcast.xml, I also tried upgrading hazelcast to version 3.7. It doesn't matter which version of hazelcast I use, the results are the same.

As soon as I connect to the server, the connection locks up. The server still works fine over port 2480. You can still use the front-end in the browser but can't open a connection via pyorient.

We have a large DB and collect around 2.5 million vertices data each month with about 5 million edges. It's vital for us to run in distributed mode because a single server won't be able to scale beyond that capacity. As things are at the moment, it seem like OrientDB has the capability to run as a distributed database but that functionality doesn't seem to work.

We were running the dockers but switched to the binaries in order to upgrade to hazelcast 3.7.

Has anyone been able to get OrientDB working in production as distributed and what are we missing?

Upvotes: 1

Views: 573

Answers (2)

anber
anber

Reputation: 874

This does not seem to be an issue with Hazelcast or AWS. There was 2 issues with my setup. The first issue has to do with OrientDB not refreshing of replacing my distributed-config.json with settings from default-distributed-db-config.json. The result was that every node, that have ever connected to my DB, was appended to that file and none of my default-distributed-db-config.json settings were reflecting in that config.

I added a start-up, script to delete that distributed-config.json every time my server starts up in order to refresh the list of nodes and update my settings.

The second issue has to do with Pyorient. Pyorient has a bug in that it can't parse the messages returned from OrientDB when in distributed mode. This causes the connection to go into an infinite loop.

There is currently a development branch on pyorient that implements the missing binary serialiser (OrientSerialization.Binary). I have another branch that has some fixes merged into it.

Install it with:

pip install https://github.com/anber500/pyorient/tarball/17f5e42e83859a661c6483f7fa812226194694dd#egg=pyorient

Set your serialiser as follows:

client = pyorient.OrientDB("localhost", 2424, serialization_type=pyorient.OrientSerialization.Binary)

You will also need an updated version of pyorient_native. The first release had a memory leak so use the version from the master branch:

pip install https://github.com/nikulukani/pyorient_native/tarball/master#egg=pyorient_native

This works perfectly on AWS in distributed mode and is much faster than the CSV serializer.

Hope it helps.

Upvotes: 1

pveentjer
pveentjer

Reputation: 11307

You are using a ec2 public ip address and not the ec2 private ip address. Public ip addresss often start with 57 or 54. Private ip addresses often with 10.

Upvotes: -1

Related Questions