khrist safalhai
khrist safalhai

Reputation: 570

Spark app unable to write to elasticsearch cluster running in docker

I have a elasticsearch docker image listening on 127.0.0.1:9200, I tested it using sense and kibana, It works fine, I am able to index and query documents. Now when I try to write to it from a spark App

val sparkConf = new SparkConf().setAppName("ES").setMaster("local")
sparkConf.set("es.index.auto.create", "true")
sparkConf.set("es.nodes", "127.0.0.1")
sparkConf.set("es.port", "9200")
sparkConf.set("es.resource", "spark/docs")


val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")
val rdd = sc.parallelize(Seq(numbers, airports))

rdd.saveToEs("spark/docs")

It fails to connect, and keeps on retrying

16/07/11 17:20:07 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Operation timed out 16/07/11 17:20:07 INFO HttpMethodDirector: Retrying request

I tried using IPAddress given by docker inspect for the elasticsearch image, that also does not work. However when I use a native installation of elasticsearch, the Spark App runs fine. Any ideas?

Upvotes: 6

Views: 1117

Answers (3)

antike
antike

Reputation: 231

Had the same problem and a further issue was that the confs set using sparkConf.set() didn't have an effect. But supplying the confs with the saving function worked, like this:

rdd.saveToEs("spark/docs", Map("es.nodes" -> "127.0.0.1", "es.nodes.wan.only" -> "true"))

Upvotes: 1

bp2010
bp2010

Reputation: 2462

Also, set the config

es.nodes.wan.only to true

As mentioned in this answer if you are having issues writing to ES.

Upvotes: 3

Eyal.Dahari
Eyal.Dahari

Reputation: 770

Couple things I would check:

  • The Elasticsearch-Hadoop spark connector version you are working with. Make sure that it is not beta. There was a fixed bug related to the IP resolving.

  • Since 9200 is the default port, you may remove this line: sparkConf.set("es.port", "9200") and check.

  • Check that there is no proxy configured in your Spark environment or config files.

  • I assume that you run Elasticsaerch and Spark on the same machine. Can you try to configure your machine IP address instead of 127.0.0.1

Hope this helps! :)

Upvotes: 1

Related Questions