Reputation: 31
I have a log table that has customer data as follows in the spark instance over cloud storage data. I am trying to query the same using apache zeppelin over spark
CustomerID TimeStamp Distance
------------------------------
1 12.00 310
2 12.00 821
1 12.01 313
3 12.01 734
2 12.01 821
1 12.03 323
3 12.02 734
2 12.03 824
I want to find out if any customer has travelled a distance more than 3 in their two consecutive entires I tried to do a join on the same table on customer ID and put the above conditions in the where clause like below, but that did not help; i think the join on customerID is incorrect and I am getting the entire set of results
Select t1.customerID, t1.timestamp
from sometable
inner join sometable t2 on t2.customerID = t1.customerID
where t2.timestamp-t1.timestamp < .02 and t2.distance - t1.distance > 3
Upvotes: 3
Views: 2162
Reputation: 49260
You can use lag
to do this.
select customerID,timestamp
from (select customerID,timestamp
,distance-lag(distance,1,distance) over(partition by customerID order by timestamp) as diff_with_prev_dist
from sometable
) t
where diff_with_prev_dist > 3
Upvotes: 2