Reputation: 503
I am using Spark SQL where I want to fetch whole data everyday from a Oracle table(consist of more than 1800k records). The application is hanging up when I read from Oracle hence I used concept of partitionColumn,lowerBound & upperBound. But,the problem is how can I get lowerBound & upperBound value of primary key column dynamically?? Every day value of lowerBound & upperBound will be changing.Hence how can I get the boundary values of primary key column dynamically?? Can anyone guide me an sample example for my problem?
Upvotes: 4
Views: 2050
Reputation: 330413
Just fetch required values from the database:
url = ...
properties = ...
partition_column = ...
table = ...
# Push aggregation to the database
query = "(SELECT min({0}), max({0}) FROM {1}) AS tmp".format(
partition_column, table
)
(lower_bound, upper_bound) = (spark.read
.jdbc(url=url, table=query. properties=properties)
.first())
and pass to the main query:
num_partitions = ...
spark.read.jdbc(
url, table,
column=partition_column,
# Make upper bound inclusive
lowerBound=lower_bound, upperBound=upper_bound + 1,
numPartitions=num_partitions, properties=properties
)
Upvotes: 5