How to implement index update functionality in elasticsearch using spark?

Question

I am new to ElasticSearch. I have a huge data to index using Elasticsearch.

I am use Apache Spark to index the data in hive table using Elasticsearch.

as part of this functionality, i wrote simple Spark Script.

object PushToES {
  def main(args: Array[String]) {
       val Array(inputQuery, index, host) = args
     val sparkConf = new SparkConf().setMaster("local[1]").setAppName("PushToES")
    sparkConf.set("....",Host)
    sparkConf.set("....","9200")
    val sc = new SparkContext(sparkConf)
    val ht = new org.apache.spark.sql.hive.HiveContext(sc)

    val ps = hhiveSqlContext.sql(inputQuery)
     ps.toJSON.saveJsonToEs(index)

  }
}

After that I am generating jar and submitting the job by using spark-submit

spark-submit --jars ~/*.jar --master local[*] --class com.PushToES *.jar "select * from gtest where day=20170711" gest3 localhost

then I am executing the below command for

curl -XGET 'localhost:9200/test/test_test/_count?pretty'

first time it is showing properly

{
  "count" : 10,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

if i execute second time same curl command it is giving result like bleow

{
  "count" : 20,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

if i execute 3rd time same command i am getting

{
  "count" : 30,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

But I am not understanding every time why it is adding count value to existing index value(i.e. Count)

Please let me know how can I resolve this issue i.e . if I am execute any number of time also I have to get same value (correct count value i.e 10)

I am expecting below result for this case because correct count value is 10.(I executed count query on hive table for getting every time count(*) as 10)

{
  "count" : 10,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  }
}

Thanks in advance .

How to implement index update functionality in elasticsearch using spark?

Answers (1)

Related Questions