Reputation: 80176
I have ORDERS
and SELLERS
tables with the following schema:
scala> spark.sql("desc ORDERS").show()
+---------------+---------+-------+
| col_name|data_type|comment|
+---------------+---------+-------+
| order_id| string| null|
| product_id| string| null|
| seller_id| string| null|
| date| string| null|
|num_pieces_sold| string| null|
| bill_raw_text| string| null|
+---------------+---------+-------+
scala> spark.sql("desc SELLERS").show()
+------------+---------+-------+
| col_name|data_type|comment|
+------------+---------+-------+
| seller_id| string| null|
| seller_name| string| null|
|daily_target| string| null|
+------------+---------+-------+
I am getting an error with the following query. I understand we can use subquery (or withColumn
) to avoid this but wondering if there is an alternate way.
scala>
q3: String =
"
SELECT
o.seller_id,
o.num_pieces_sold/s.daily_target AS ratio,
avg(ratio) AS percent_contribution
FROM orders o
JOIN sellers s ON o.seller_id = s.seller_id
GROUP BY o.seller_id
"
scala> spark.sql(q3).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`ratio`' given input columns: [o.bill_raw_text, s.seller_id, o.date, o.seller_id, o.order_id, o.product_id, s.seller_name, o.num_pieces_sold, s.daily_target]; line 5 pos 6;
'Aggregate [seller_id#8], [seller_id#8, (cast(num_pieces_sold#10 as double) / cast(daily_target#20 as double)) AS ratio#421, 'avg('ratio) AS percent_contribution#422]
+- Join Inner, (seller_id#8 = seller_id#18)
:- SubqueryAlias `o`
: +- SubqueryAlias `orders`
: +- Relation[order_id#6,product_id#7,seller_id#8,date#9,num_pieces_sold#10,bill_raw_text#11] parquet
+- SubqueryAlias `s`
+- SubqueryAlias `sellers`
+- Relation[seller_id#18,seller_name#19,daily_target#20] parquet
Upvotes: 1
Views: 1336
Reputation: 29165
seems like catalyst is not recognizing (by design) the intermediate expression ratio
and again it was used in another column like normal sql
direct usage with avg
function is working like below with out aliasing the ratio
package examples
import org.apache.log4j.Level
import org.apache.spark.sql.{SaveMode, SparkSession}
object Ratio extends App {
val logger = org.apache.log4j.Logger.getLogger("org")
logger.setLevel(Level.WARN)
val spark: SparkSession = SparkSession.builder.appName(getClass.getName).enableHiveSupport()
.master("local[*]").getOrCreate
import spark._
import spark.implicits._
sql(
"""
|CREATE TABLE IF NOT EXISTS orders
|(
| order_id string
|, product_id String
|, seller_id String
|, date int
|, num_pieces_sold String
|, bill_raw_text String
|)
""".stripMargin)
sql(
"""
|CREATE TABLE IF NOT EXISTS sellers
|(
| seller_id string
|, seller_name String
|, daily_target String )
""".stripMargin)
sql("SHOW tables ").show
val df1 = Seq(
("1", "1", "1", 20200520, 10, "rawtext"),
("2", "2", "2", 20200521, 11, "rawtext1")
).toDF("order_id", "product_id", "seller_id", "date", "num_pieces_sold", "bill_raw_text")
df1.write.mode(SaveMode.Overwrite).saveAsTable("orders")
df1.show
val df2 = Seq(
("1", "seller_name1", "11"),
("2", "seller_name2", "22")
).toDF("seller_id", "seller_name", "daily_target")
df2.write.mode(SaveMode.Overwrite).saveAsTable("sellers")
df2.show
val q3 =
"""
| SELECT
| o.seller_id,
| avg(o.num_pieces_sold/ s.daily_target) AS percent_contribution
| FROM orders o
| JOIN sellers s ON o.seller_id = s.seller_id
| GROUP BY o.seller_id
""".stripMargin
spark.sql(q3).show
}
Result :
+--------+-------------+-----------+
|database| tableName|isTemporary|
+--------+-------------+-----------+
default| orders| false|
| default| sellers| false|
+--------+-------------+-----------+
20/05/18 21:08:29 INFO log: Updating table stats fast for orders
20/05/18 21:08:29 INFO log: Updated size of table orders to 2949
+--------+----------+---------+--------+---------------+-------------+
|order_id|product_id|seller_id| date|num_pieces_sold|bill_raw_text|
+--------+----------+---------+--------+---------------+-------------+
| 1| 1| 1|20200520| 10| rawtext|
| 2| 2| 2|20200521| 11| rawtext1|
+--------+----------+---------+--------+---------------+-------------+
+---------+------------+------------+
|seller_id| seller_name|daily_target|
+---------+------------+------------+
| 1|seller_name1| 11|
| 2|seller_name2| 22|
+---------+------------+------------+
+---------+--------------------+
|seller_id|percent_contribution|
+---------+--------------------+
| 1| 0.9090909090909091|
| 2| 0.5|
+---------+--------------------+
Upvotes: 3