Reputation: 71
I am new to Scala and trying to grab the language fundamentals. I have working knowledge of Spark with Java API.
I have some hard time understanding some scala code and therfore I am not able to write the same in Java. I got this piece of code in https://learn.microsoft.com/en-us/azure/cosmos-db/spark-connector
// Import Necessary Libraries
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark._
import com.microsoft.azure.cosmosdb.spark.config.Config
// Read Configuration
val readConfig = Config(Map(
"Endpoint" -> "https://doctorwho.documents.azure.com:443/",
"Masterkey" -> "YOUR-KEY-HERE",
"Database" -> "DepartureDelays",
"Collection" -> "flights_pcoll",
"query_custom" -> "SELECT c.date, c.delay, c.distance, c.origin, c.destination FROM c WHERE c.origin = 'SEA'" // Optional
))
// Connect via azure-cosmosdb-spark to create Spark DataFrame
val flights = spark.read.cosmosDB(readConfig)
flights.count()
As far as I know the read method returns an object of type org.apache.spark.sql.DataFrameReader
and this does not have any method cosmosDB()
, then how this code is working. Also how do I convert this code to Java.
Thank You
Upvotes: 0
Views: 211
Reputation: 20571
What you are seeing is the magic of Scala implicit conversions. The compiler sees that you intend to call the cosmosDB
method of a DataFrameReader
and that there's no method of that name with the proper signature, as you note.
When you
import com.microsoft.azure.cosmosdb.spark.schema._
you also import the contents of the package object (current git commit as of this writing, last updated in 2017 so it's stable code). The relevant bit that gets imported is
implicit def toDataFrameReaderFunctions(dfr: DataFrameReader): DataFrameReaderFunctions
An implicit def
which takes one argument signals to the compiler that, if this def
is in scope, the compiler can insert a call to this method if:
DataFrameReader
DataFrameReader
com.microsoft.azure.cosmosdb.spark.schema.DataFrameReaderFunctions
has member with the desired name and signatureSince DataFrameReaderFunctions
has a method cosmosDB
, the compiler then translates your code to
toDataFrameReaderFunctions(spark.read).cosmosDB(readConfig)
This general approach of using an implicit conversion to make it look like you're adding methods to a type without modifying the type is called enrichment or an extension method. Implicit conversions in general should probably be avoided: they very often make code hard to follow and an errant implicit conversion in scope can make code you don't intend to compile compile. For an enrichment like this, there's an alternative: use an implicit class
, where the compiler essentially autogenerates the implicit conversion but this doesn't allow you to use an Int
in place of a String
.
Upvotes: 1