Reputation: 141
I have a csv file in hdfs, how can I query this file with spark SQL? For example I would like to make a select request on special columns and get the result to be stored again to the Hadoop distributed file system
Thanks
Upvotes: 2
Views: 5813
Reputation: 966
val df = sparkSession.read.csv("path to your file in HDFS")
.val df2 = df.select("field1", "field2")
.df2.write.csv("path to a new file in HDFS")
Upvotes: 1
Reputation: 152
you can achieve by creating Dataframe.
val dataFrame = spark.sparkContext
.textFile("examples/src/main/resources/people.csv")
.map(_.split(","))
.map(attributes => Person(attributes(0), attributes(1).trim.toInt))
.toDF()
dataFrame.sql("<sql query>");
Upvotes: 1