Guillaume
Guillaume

Reputation: 1286

Slice array of structs using column values

I want to use Spark slice function with start and length defined as Column(s).

def slice(x: Column, start: Int, length: Int): Column

x looks like this:

`|-- x: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- a: double (nullable = true)
 |    |    |-- b : double (nullable = true)
 |    |    |-- c: double (nullable = true)
 |    |    |-- d: string (nullable = true)
 |    |    |-- e: double (nullable = true)
 |    |    |-- f: double (nullable = true)
 |    |    |-- g: long (nullable = true)
 |    |    |-- h: double (nullable = true)
 |    |    |-- i: double (nullable = true)
...
`

any idea on how to achieve this ?

Thanks !

Upvotes: 1

Views: 673

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

You cannot use the built-in DataFrame DSL function slice for this (as it needs constant slice bounds), you can use an UDF for that. If df is your dataframe and you have a from und until column, then you can do:

val mySlice = udf(
  (data:Seq[Row], from:Int, until:Int) => data.slice(from,until),
  df.schema.fields.find(_.name=="x").get.dataType
)

df
  .select(mySlice($"x",$"from",$"until"))
  .show()

Alternatively, you can use the SQL-Expression in Spark SQL:

df
   .select(expr("slice(x,from,until)"))
   .show()

Upvotes: 1

Related Questions