Reputation: 59
I'm trying to get the union
of an ArrayBuffer[Dataset[_]]
.
So I wrote the the following code:
var buffer: ArrayBuffer[Dataset[_]] = ArrayBuffer.empty[Dataset[_]]
var size:Long = 0
...
if size < 1000 {
buffer.append(df)
size = size + df.count()
} else {
val unionedDataset = buffer.reduce(_ union _)
}
I get the following error:
type mismatch;
[error] found : org.apache.spark.sql.Dataset[_$2(in value $anonfun)] where type _$2(in value $anonfun)
[error] required: org.apache.spark.sql.Dataset[_$2(in variable buffer)]
[error] val unionedDataset = buffer.reduce(_ union _)
[error] ^
Shouldn't the type of the second argument in the anonymous function be the same type of the object at the index that was referenced?
Upvotes: 0
Views: 591
Reputation: 59
I figured out that I can avoid this issue by doing the following:
val unionedDataset = buffer.reduce(_.toDF() union _.toDF())
Upvotes: 1
Reputation: 27373
You could use Any
instead of _
, this should also work:
var buffer: ArrayBuffer[Dataset[Any]] = ArrayBuffer.empty[Dataset[Any]]
var size:Long = 0
...
if size < 1000 {
buffer.append(df.asInstanceOf[Dataset[Any]])
size = size + df.count()
} else {
val unionedDataset = buffer.reduce(_ union _)
}
Upvotes: 0
Reputation: 170815
ArrayBuffer[Dataset[_]]
can contain e.g. a Dataset[String]
and a Dataset[Int]
at the same time, and union
isn't defined for them.
If you had ArrayBuffer[Dataset[T]] forSome { type T }
, you could write buffer.reduce(_ union _)
but then buffer.append(df)
won't work: df
must have type Dataset[T]
but you don't know what T
is.
Upvotes: 0