Reputation: 1
I am new to/still learning Apache Spark/Scala. I am trying to analyze a dataset and have loaded the dataset into Scala. However, when I try to perform a basic analysis such as max, min or average, I get an error -
error: value select is not a member of org.apache.spark.rdd.RDD[Array[String]]
Could anyone please shed some light on this please? I am running Spark on the cloudlab of an organization.
Code:
// Reading in the csv file
val df = sc.textFile("/user/Spark/PortbankRTD.csv").map(x => x.split(","))
// Select Max of Age
df.select(max($"age")).show()
Error:
<console>:40: error: value select is not a member of org.apache.spark.rdd.RDD[Array[String]]
df.select(max($"age")).show()
Please let me know if you need any more information. Thanks
Upvotes: 0
Views: 247
Reputation: 5304
Following up on my comment, the textFile
method returns an RDD[String]
. select
is a method on DataFrame
. You will need to convert your RDD[String]
into a DataFrame
. You can do this in a number of ways. One example is
import spark.implicits._
val rdd = sc.textFile("/user/Spark/PortbankRTD.csv")
val df = rdd.toDF()
There are also built-in readers for many types of input files:
spark.read.csv("/user/Spark/PortbankRTD.csv")
returns a DataFrame
immediately.
Upvotes: 3