Hattabi Maher
Hattabi Maher

Reputation: 141

Query hdfs with Spark Sql

I have a csv file in hdfs, how can I query this file with spark SQL? For example I would like to make a select request on special columns and get the result to be stored again to the Hadoop distributed file system

Thanks

Upvotes: 2

Views: 5813

Answers (2)

Anton Okolnychyi
Anton Okolnychyi

Reputation: 966

  1. You should create a SparkSession. An example is here.
  2. Load a CSV file: val df = sparkSession.read.csv("path to your file in HDFS").
  3. Perform your select operation: val df2 = df.select("field1", "field2").
  4. Write the results back: df2.write.csv("path to a new file in HDFS")

Upvotes: 1

Prudvi Sagar
Prudvi Sagar

Reputation: 152

you can achieve by creating Dataframe.

val dataFrame = spark.sparkContext
  .textFile("examples/src/main/resources/people.csv")
  .map(_.split(","))
  .map(attributes => Person(attributes(0), attributes(1).trim.toInt))
  .toDF()

dataFrame.sql("<sql query>");

Upvotes: 1

Related Questions