Reputation: 183
I have a hive table partitoned by ds
where ds
is a string like 2017-11-07
. Since string is comparable, I want to select latest partition of data from hive by sparksql, so I wrote this code:
Dataset<Row> ds = sparkSession.sql("select max(ds) from admin_zpw123_user_meta");
String s = ds.first().getString(0);
sparkSession.sql("select * from admin_zpw123_user_meta where ds="+s).show();
I can print the string s, which is 2017-11-07
but I didn't get any output from the third statement. I want to know why and is there an elegent way to do this?
Upvotes: 2
Views: 4842
Reputation: 3751
Actions are very inefficient in spark, and you have an useless one on:
String s = ds.first().getString(0);
To fix that, you can filter only the latest partition date simple by doing:
sparkSession.sql("select * from admin_zpw123_user_meta where ds in (select max(distinct ds) from admin_zpw123_user_meta)").show();
Upvotes: 0
Reputation: 28322
You need to have single quotes around the 2017-11-07
string when using it in the SQL statement. You can add it to the query like this:
sparkSession.sql("select * from admin_zpw123_user_meta where ds='" + s + "'").show();
Upvotes: 2
Reputation: 183
I just add ’’
to 2017-11-07 then it works, but it’s still not that elegant
Upvotes: 0