Reputation: 421
I am trying to load a csv file in scala from spark. I see that we can do using the below two different syntaxes:
sqlContext.read.format("csv").options(option).load(path)
sqlContext.read.options(option).csv(path)
What is the difference between these two and which gives the better performance? Thanks
Upvotes: 1
Views: 561
Reputation: 37822
There's no difference.
So why do both exist?
.format(fmt).load(path)
method is a flexible, pluggable API that allows adding more formats without having to re-compile spark - you can register aliases for custom Data Source implementations and have Spark use them; "csv" used to be such a custom implementation (outside of the packaged Spark binaries), but it is now part of the projectcsv
, parquet
, json
...) which make the code a bit simpler (and verified at compile time)Eventually, they both create a CSV Data Source and use it to load the data.
Bottom line, for any supported format, you should opt for the "shorthand" method, e.g. csv(path)
.
Upvotes: 3