Reputation: 945
I think I am missing something but can't figure what. I want to load data using SQLContext and JDBC using particular sql statement like
select top 1000 text from table1 with (nolock)
where threadid in (
select distinct id from table2 with (nolock)
where flag=2 and date >= '1/1/2015' and userid in (1, 2, 3)
)
Which method of SQLContext should I use? Examples I saw always specify table name and lower and upper margin.
Thanks in advance.
Upvotes: 3
Views: 8888
Reputation: 2390
val url = "jdbc:postgresql://localhost/scala_db?user=scala_user"
Class.forName(driver)
val connection = DriverManager.getConnection(url)
val df2 = spark.read
.format("jdbc")
.option("url", url)
.option("dbtable", "(select id,last_name from emps) e")
.option("user", "scala_user")
.load()
The key is "(select id,last_name from emps) e", here you can write a subquery in place of table_name.
Upvotes: -3
Reputation: 330093
You should pass a valid subquery as a dbtable
argument. For example in Scala:
val query = """(SELECT TOP 1000
-- and the rest of your query
-- ...
) AS tmp -- alias is mandatory*"""
val url: String = ???
val jdbcDF = sqlContext.read.format("jdbc")
.options(Map("url" -> url, "dbtable" -> query))
.load()
* Hive Language Manual SubQueries: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
Upvotes: 8