Reputation: 12869
Is it possible to have a pair RDD from the below SQL query.
The pair being ((item_id, flight_id), metric1)
item_id, flight_id are part of group by.
SELECT
item_id,
flight_id,
SUM(metric1) AS metric1
FROM mytable
GROUP BY
item_id,
flight_id
Upvotes: 0
Views: 48
Reputation: 330093
As as mentioned by eliasah you can simply map over a RDD (with optional rdd
between query and map
) as follows:
sqlContext.sql(query).map{case Row(item_id: U, flight_id: V, metric1: T) =>
((item_id, flight_id), metric1)}
Where T
, U
, V
are types of data, sqlContext
is a SQLContext
instance and query is a query provided in your question.
Upvotes: 1