Reputation: 761
I have a DataFrame containing three DataFrames of the same type (same parquet schema). They only differ in the content/values they are containing:
I want to flatten the structure, so that the three DataFrames are getting merged into one single Parquet DataFrame containing all of the content/values.
I tried it with flatten and flatMap, but with that I always get the error:
Error: No implicit view available from org.apache.spark.sql.DataFrame => Traversable[U].parquetsFiles.flatten
Error: not enough arguments for method flatten: (implicit as Trav: org.apache.spark.sql.DataFrame => Traversable[U], implicit m: scala.reflect.ClassTag[U]. Unspecified value parameters asTrav, m. parquetFiles.flatten
I also converted it to a List and then tried to flatten and this is also producing the same error. Do you have any idea how to solve it or what is the problem here? Thanks, Alex
Upvotes: 0
Views: 1574
Reputation: 21730
The scala compiler is looking for a way to convert the DataFrame
s to a Traversable
so it can apply the flatten
. But a DataFrame
is not Traversable
, so it will fail. Also, no ClassTag
available because DataFrame
s are not statically typed.
The code you're looking for is
parquetFiles.reduce(_ unionAll _)
which can be optimized by the DataFrame
execution engine.
Upvotes: 3
Reputation: 7442
So it seems like you want to join these three DataFrames
together, to do that the unionAll
function would work really well. You could do parquetFiles.reduce((x, y) => x.unionAll(y))
(note this will explode on an empty list but if you might have that just look at one of the folds instead of reduce).
Upvotes: 2