Reputation: 11
I must operate with RDD by Scala/Spark methods and by SQL queries.
Is it possible to operate with RDD directly via SQL queries?
The proposed ways (schemaRDD or DataFrame) require extra memory leakage.
After such a transformation I have in the memory two identical huge objects.
Upvotes: 1
Views: 1315
Reputation: 40370
Yes, in a way, you may be able to do so. But you'll need to create your own version of DataFrame.
DataFrame is an abstraction over RDDs. Nevertheless, joins, filters, etc. the features that you find with Spark-SQL are optimized with DataFrames but they were made on RDDs first.
Upvotes: 1