rajkumar chilukuri
rajkumar chilukuri

Reputation: 245

How to Execute sql queries in Apache Spark

I am very new to Apache Spark.
I have already configured spark 2.0.2 on my local windows machine. I have done with "word count" example with spark.
Now, I have the problem in executing the SQL Queries. I have searched for the same , but not getting proper guidance .

Upvotes: 12

Views: 70492

Answers (5)

Pankaj Shet
Pankaj Shet

Reputation: 189

Executing SQL Queries using spark.sql() or dataset API will compile to exactly same code by the catayst optimiser at compile time and AQE at runtime. You can choose whatever you are comfortable.

Upvotes: 0

GaneshMuni
GaneshMuni

Reputation: 51

You should have your Spark DataFrame.
Create a TempView out of DataFrame

df.createOrReplaceTempView("dftable")
dfsql = sc.sql("select * from dftable")

You can use long queries in statement format:

sql_statement = """
select sensorid, objecttemp_c,
year(DateTime) as year_value,
month(DateTime) as month_value,
day(DateTime) as day_value,
hour(DateTime) as hour_value
from dftable
order by 1 desc
"""

dfsql = sc.sql(sql_statement)

Upvotes: 2

Brian
Brian

Reputation: 976

Its rather simple now in spark to do SQL queries. You can do SQL on dataframes as pointed out by others but the questions is really how to do SQL.

spark.sql("SHOW TABLES;") that's it.

Upvotes: 0

Shivansh
Shivansh

Reputation: 3544

So you need to do these things to get it done ,

In Spark 2.0.2 we have SparkSession which contains SparkContext instance as well as sqlContext instance.

Hence the steps would be :

Step 1: Create SparkSession

val spark = SparkSession.builder().appName("MyApp").master("local[*]").getOrCreate()

Step 2: Load from the database in your case Mysql.

val loadedData=spark
      .read
      .format("jdbc")
      .option("url", "jdbc:mysql://localhost:3306/mydatabase")
      .option("driver", "com.mysql.jdbc.Driver")
      .option("mytable", "mydatabase")
      .option("user", "root")
      .option("password", "toor")
      .load().createOrReplaceTempView("mytable")

Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.

val dataFrame=spark.sql("Select * from mytable")
dataFrame.show()

P.S: It would be better if you use DataFrame Api's or even better if DataSet Api's , but for those you need to go through the documentation.

Link to Documentation: https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.sql.Dataset

Upvotes: 16

Glennie Helles Sindholt
Glennie Helles Sindholt

Reputation: 13154

In Spark 2.x you no longer reference sqlContext, but rather spark, so you need to do:

spark
  .read
  .format("jdbc")
  .option("url", "jdbc:mysql://localhost:3306/mydb")
  .option("driver", "com.mysql.jdbc.Driver")
  .option("dbtable", "mydb")
  .option("user", "root")
  .option("password", "")
  .load()

Upvotes: 9

Related Questions