Elasticsearch spark reading slow

Question

Reading from Elasticsearch v6.2 into spark using the prescribed spark connector org.elasticsearch:elasticsearch-spark-20_2.11:6.3.2 is horrendously slow. This is from a 3 node ES cluster with index:

curl https://server/_cat/indices?v
green  open   db MmVwAwYfTz4eE_L-tncbwQ   5   1  199983131      9974871    105.1gb         51.8gb

Reading on a (10 node, 1tb mem, >50 VCPUs) spark cluster:

val query = """{
  "query": {
    "match_all": {}
  }
}"""

val df = spark.read
  .format("org.elasticsearch.spark.sql")
  .option("es.nodes","server")
  .option("es.port", "443")
  .option("es.net.ssl","true")
  .option("es.nodes.wan.only","true")
  .option("es.input.use.sliced.partitions", "false")
  .option("es.scroll.size", "1000")
  .option("es.read.field.include", "f1,f2,f3")
  .option("es.query",query)
  .load("db")

df.take(1)

That took 10 minutes to execute.

Is this how (slowly) it's supposed to work, or am I doing something wrong?

Elasticsearch spark reading slow

Answers (1)

Related Questions