Reputation: 111
I'm trying to create a data frame in scala as below:
var olympics =spark.read.csv("/FileStore/tables/Soccer_Data_Set_c46d1.txt").option("inferSchema","true").option("header","true").option("delimiter",",")
When I submit the code it throws me value option is not a member of org.apache.spark.sql.DataFrame
error.
However when i modify the code as below:
var olympics = spark.read.option("inferSchema","true").option("header","true").option("delimiter",",").csv("/FileStore/tables/Soccer_Data_Set_-c46d1.txt")
olympics dataframe is successfully created.
Can someone please help me understand the difference between these two code snippets?
Upvotes: 0
Views: 3254
Reputation: 1
In the first set of code: On invoking 'read.csv("/FileStore/tables/Soccer_Data_Set_c46d1.txt")' method you will be getting 'org.apache.spark.sql.Dataset' object as return value. This class do not define any 'option()' method which you are trying to invoke later ('csv(..).option("inferSchema", "true")'). So, the compiler is restricting you and throwing the error.
Please refer: Dataset class API where you can find no definition of 'option()' method
In the second set of code: On invoking 'spark.read' method you will be getting 'org.apache.spark.sql.DataFrameReader' object as return value. This class has got some of the overloaded 'option' methods been defined and as you are using one of the valid methods you are not getting any error from compiler.
Please refer DataFrameReader class API where you can find overloaded methods of 'option()' been defined.
Upvotes: 0
Reputation: 1859
After you've called csv
method, you already have a DataFrame, and data is already read "into" spark, so it doesn't make sense to set options there.
In the second example, you're calling read
to "say" that you want spark to read a file, setting properties of such read, and then actually reading the file.
Upvotes: 1