Jayanth Venkat
Jayanth Venkat

Reputation: 9

Spark Scala count function is not responding

I am loading some data from oracle db and i am trying to do count action on the data.But not getting any result and eventually getting session timeout. If i perform any other functions like first(),show() i am getting result .

I tried launching scala shell with different options like below.

 spark2-shell --jars /oracle/12c/product/12.1.0/client/jdbc/lib/ojdbc6.jar --executor-memory 10G  --num-executors 10 --driver-memory 2G yarn-client 

 spark2-shell --jars /oracle/12c/product/12.1.0/client/jdbc/lib/ojdbc6.jar 

val df = spark.read
.format("jdbc")
.option("url","jdbc:oracle:thin:@//MY_IP:MY_PORT/MY_SERVICE_NAME")
.option("dbtable","MY_TABLE")
.option("query", "select 'a','6b',count(*) from MY_TABLE_NAME partition 
for (to_date('07-MAY-2019','DD-MON-YYYY')) where COL='SOMETHING'")
.option("driver","oracle.jdbc.driver.OracleDriver")
.option("user", "MY_USERNAME")
.option("password", "MYPASSWORD")
.load()


 df.count()

This is not providing any result

I am expecting this count function to give the total count of record got dumped from the table.

Upvotes: 0

Views: 139

Answers (1)

user11466750
user11466750

Reputation: 11

There is no such option as query so your code fetches all the data from the external table, furthermore it does that using only a single partition.

The correct option is dbtable and the query, as per linked answer, should be provided as a subquery:

.option("dbtable", "(select 'a','6b',count(*) from MY_TABLE_NAME partition 
  for (to_date('07-MAY-2019','DD-MON-YYYY')) where COL='SOMETHING') as t")

Upvotes: 1

Related Questions