Reputation: 1286
I want to use Spark slice function with start and length defined as Column
(s).
def slice(x: Column, start: Int, length: Int): Column
x
looks like this:
`|-- x: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- a: double (nullable = true)
| | |-- b : double (nullable = true)
| | |-- c: double (nullable = true)
| | |-- d: string (nullable = true)
| | |-- e: double (nullable = true)
| | |-- f: double (nullable = true)
| | |-- g: long (nullable = true)
| | |-- h: double (nullable = true)
| | |-- i: double (nullable = true)
...
`
any idea on how to achieve this ?
Thanks !
Upvotes: 1
Views: 673
Reputation: 27373
You cannot use the built-in DataFrame DSL function slice
for this (as it needs constant slice bounds), you can use an UDF for that. If df
is your dataframe and you have a from
und until
column, then you can do:
val mySlice = udf(
(data:Seq[Row], from:Int, until:Int) => data.slice(from,until),
df.schema.fields.find(_.name=="x").get.dataType
)
df
.select(mySlice($"x",$"from",$"until"))
.show()
Alternatively, you can use the SQL-Expression in Spark SQL:
df
.select(expr("slice(x,from,until)"))
.show()
Upvotes: 1