Dmitry Pugachev
Dmitry Pugachev

Reputation: 467

Attempt to achieve high throughput in Hyperledger Fabric network

Hyperledger community in the article Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains shows that Fabric achieves end-to-end throughput of more than 3500 transactions per second in certain popular deployment configurations. I'm trying to achieve this result in my project, but I'm far from it. Here I report my first results of load testing and invite you to join the investigation how to achieve a high throughput with Hyperledger Fabric and Composer

Project descriptions

We build high-load service that uses Hyperledger Fabric. Our backend system consists of HF blockchain network, several microservices (node js) which communicate with blockchain via Hyperledger Composer, message broker for communication between microservices.

Hyperledger Fabric v1.1. Hypeledger Composer v0.19.0

Fabric network (deployed with Cello):

{
    fabric001: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer1st.orderer"],
      zookeepers: ["zookeeper1st"],
      kafkas: ["kafka1st"]
    },
    fabric002: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer2nd.orderer"],
      zookeepers: ["zookeeper2nd"],
      kafkas: ["kafka2nd"]
    },
    fabric003: {
      cas: [],
      peers: ["[email protected]"],
      orderers: ["orderer3rd.orderer"],
      zookeepers: ["zookeeper3rd"],
      kafkas: ["kafka3rd"]
    },
    fabric004: {
      cas: ["ca1st.main"],
      peers: [],
      orderers: ["orderer4th.orderer"],
      zookeepers: ["zookeeper4th"],
      kafkas: ["kafka4th"]
    }
}

fabric001-004 - AWS ec2 instances of t2.xlarge type. Initially, I used m5.4xlarge, but it costs a lot and CPU usage was always low even when Fabric starts to fail.

Fabric config:

BatchTimeout: 0.2s
BatchSize:
    MaxMessageCount: 10
    AbsoluteMaxBytes: 98 MB
    PreferredMaxBytes: 512 KB

TLS disabled.

If required I can perform new tests with any configuration.


Load testing

First of all I decided to test request to the state of the ledger (CouchDB). Blockchain is empty, only system data and few participants. Direct query requests to the CouchDB open port are very fast (~150 ms). My microservice connects to the Fabric by establishing a permanent connection for the existing identity. Requests take up ~500 ms in our system without high load. Half of this time accounts for message broker (AWS SQS is really slow). For load testing I'm using tool YandexTank. Load is going smoothly without latency increasing up to ~70 requests per second. Then latency stats degrade and at some point, chaincode starts return error messages. You can see test results here:

TEST RESULTS

There are two types of error messages that I received during iterations of load tests:

1.

[Hyperledger-Composer] undefined:HLFQueryHandler :queryChaincode() query payload returned an error: Error: 2 UNKNOWN: error executing chaincode: failed to execute transaction: timeout expired while executing transaction

2.

LFQueryHandler :queryChaincode() query payload returned an error: Error: 2 UNKNOWN: error executing chaincode: transaction returned with failure: Error: The current identity, with the name 'txBuilder' and the identifier '5606acbada327a8ef33134e601f990076872b31a3dda5ec0a983e04915d16007', has not been registered`

Chaincode container does not restart by itself, but from this time it doesn't work well. Sometimes I can't ping it, sometimes I can, but anyway latency is terrible. Only restart of the peer container can help. (I remind you that request to the ledger goes through one peer due to Composer, that's not good, but it's not the point of my investigation). The second error is really strange because this is the only identity I use and it works before chaincode starts to fail. And it works after I restart peer.

During applying the load, CPU usage of the peer, chaicode and CouchDB are the most (as expected). I'm in the middle of a configuring monitoring system for my blockchain network and soon I will be able to share more information.

Any thoughts?


UPDATE #1

I've been advised to use c*-type AWS instances for deploying Fabric. I chose c5.4xlarge (16 vCPU) for my tests. Also, I changed Fabric config a little bit:

BatchTimeout: 1s
BatchSize:
    MaxMessageCount: 20
    AbsoluteMaxBytes: 98 MB
    PreferredMaxBytes: 512 KB

I performed the same test and, to my regret, I got the same result:

TEST RESULTS

In the figure below you can see the plot of containers CPU usage during the test which lasts 1 minute

CPU load of fabric001 instance

Total CPU usage in maximum was ~ 30%. So we can see that problem of latency degradation lies elsewhere.


UPDATE #2

As performance results were very poor, I decided to continue my tests with pure Fabric without any unnecessary intermediate components. Just Fabric network and nodejs SDK. See new report here

Upvotes: 8

Views: 3196

Answers (2)

biligunb
biligunb

Reputation: 41

First of all how many peers you have will effect on the TPS result. Almost always better if you have more peers. (But it really depends on strategy etc... many other things)

Secondly batchsize, timeout, message count all matters too. If you need higher TPS you might need bigger size and higher messagecount (like 100 for example)

Also it seemd Java SDK is a little bit faster than Node SDK. But i have not confirmed it myself. It is possible to go over 1000TPS though. (<<<< CONFIRMED myself)

Upvotes: 0

Ashish Mishra
Ashish Mishra

Reputation: 119

I did a similar test with similar kind of setup and could achieve about 220 RPS, using 8 peer nodes, single Org. With a 2nd org, this performance would drop for sure. I used the high performance chaincode provided with fabric samples. Not sure how did they manage to get 3500 RPS.

Upvotes: 1

Related Questions