Apache NiFi to ElasticSearch connection is slow
We are trying to build a data pipeline in Apache Nifi which will:
- Pull huge data several MySQL Database (in total more than 150 million rows)
- Convert it to JSON format (arrays or objects)
- Push them to ElasticSearch (later Apache Superset will use those indices as datasets)
Some more context, I am using these processors in Apache NiFi:
- ExecuteSQL -> select all the table names from database
- ConvertAvroToJson -> convert table names list from Avro to JSON
- SplitJson -> split each table name per Flowfile
- EvaluateJSONPath -> to read the flow file content from previous processor and extract the table name.
- GenerateTableFetch -> Produce SELECT queries from tables
* ExecuteSQL -> to execute queries coming from GenerateTableFetch
- SplitAvro -> Splitting output of ExecuteSQL
- ConvertAvrotoJSON -> converting SplitAvro results to JSON for elasticsearch
- UpdateAttribute -> to update attribute tableName to make it with lowercase letter as ElasticSearch doesn't accept uppercase letter for index name.
- PutElasticsearchRecord -> Pushing records into ElasticSearch
However, last part of pushing to PutElasticsearchRecord is extremely slow.
I have built ElasticSearch and Apache Nifi in separate EC2 instances. Each machine has 32GB RAM. Apache NiFi has 20GB JVM heap and ElasticSearch has 16GB JVM heap.
Even with 12,000 rows through the pipeline, last part of pushing of Elasticsearch is very slow, I am not talking about millions of rows. When I check resource usage of host machines, Apache Nifi machine is 46% RAM usage and ElasticSearch machine is 12% RAM usage. Could you please help me to understand what I am doing wrong or what else I should do? I don't want to increase RAMs more and more unnecessarily.
Thank you!