BdEngineer
BdEngineer

Reputation: 3179

How to define/design a custom partitions for spark app which using cassandra-connector

I am using spark-cassandra-connector.Need to fetch data from oracle table. I have "fiscal_year" and "date_of_creation" columns. currently i have set

.option("lowerBound", 2000);
.option("upperBound",2020);
.option("partitionColumn", "fiscal_year");

//this works but it resulting lot of skewness in data. as a result spark working running for hours.

Hence would like to use "date_of_creation" column as partitioning key as below

.option("lowerBound", "31-MAR-02");
.option("upperBound", "01-MAY-19");
.option("partitionColumn", "date_of_creation");  

But it gives an error like "ORA-00932: inconsistent datatypes: expected DATE got NUMBER"

what is wrong here? Is there any possibility set multiple columns as like

option("partitionColumn", ["date_of_creation" ,"fiscal_year"]); 

for some records in the oracle table if ,"fiscal_year" is null , how to write a Custom Partitioner in this case?

Upvotes: 0

Views: 77

Answers (1)

Ged
Ged

Reputation: 18003

Upper and Lower Bound must be numeric and the corresponding partitioning column. It's that simple, not DATE Type or String equivalent. You can have numeric equivalents of dates, of course.

See an excellent post, not me: https://medium.com/@radek.strnad/tips-for-using-jdbc-in-apache-spark-sql-396ea7b2e3d3

Upvotes: 1

Related Questions