Nakeuh
Nakeuh

Reputation: 1919

Scala : Read Array value in Elasticsearch with Spark

I am trying to read datas from Elasticsearch, but the document I want to read contains a nested array (that I want to read).

I included the option "es.read.field.as.array.include" in the following way :

val dataframe = reader
            .option("es.read.field.as.array.include","arrayField")
            .option("es.query", "someQuery")
            .load("Index/Document")

But got the error

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to java.lang.Float

How should I map my array to read it ?

Sample of data from ES :

{
    "_index": "Index",
    "_type": "Document",
    "_id": "ID",
    "_score": 1,
    "_source": {
        "currentTime": 1516211640000,
        "someField": someValue,
        "arrayField": [
        {
            "id": "000",
            "field1": 14,
            "field2": 20.23871387052084,
            "innerArray": [[ 55.2754,25.1909],[ 55.2754,25.190929],[ 55.27,25.190]]
        }, ...
        ],
    "meanError": 0.3082,

    }
}

Upvotes: 0

Views: 1672

Answers (1)

Neil_TW
Neil_TW

Reputation: 97

Your sample data inner-array need to be 2 array columns

you can try this sampled

val es = spark.read.format("org.elasticsearch.spark.sql")
  .option("es.read.field.as.array.include","arrayField,arrayField.innerArray:2")
  .option("es.query", "someQuery")
  .load("Index/Document")

 |-- arrayField: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- field1: long (nullable = true)
 |    |    |-- field2: float (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- innerArray: array (nullable = true)
 |    |    |    |-- element: array (containsNull = true)
 |    |    |    |    |-- element: float (containsNull = true)
 |-- currentTime: long (nullable = true)
 |-- meanError: float (nullable = true)
 |-- someField: string (nullable = true)


 +--------------------+-------------+---------+---------+
 |          arrayField|  currentTime|meanError|someField|
 +--------------------+-------------+---------+---------+
 |[[14,20.238714,00...|1516211640000|   0.3082|someValue|
 +--------------------+-------------+---------+---------+

Upvotes: 4

Related Questions