user1948296
user1948296

Reputation: 31

Calculate difference of column values between two row in Spark SQL

I have a log table that has customer data as follows in the spark instance over cloud storage data. I am trying to query the same using apache zeppelin over spark

CustomerID TimeStamp Distance
------------------------------
1           12.00        310
2           12.00        821
1           12.01        313
3           12.01        734
2           12.01        821
1           12.03        323
3           12.02        734
2           12.03        824

I want to find out if any customer has travelled a distance more than 3 in their two consecutive entires I tried to do a join on the same table on customer ID and put the above conditions in the where clause like below, but that did not help; i think the join on customerID is incorrect and I am getting the entire set of results

Select t1.customerID, t1.timestamp 
from sometable
inner join sometable t2 on t2.customerID = t1.customerID
where t2.timestamp-t1.timestamp < .02 and t2.distance - t1.distance > 3

Upvotes: 3

Views: 2162

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

You can use lag to do this.

select customerID,timestamp
from (select customerID,timestamp
      ,distance-lag(distance,1,distance) over(partition by customerID order by timestamp) as diff_with_prev_dist
      from sometable 
     ) t
where diff_with_prev_dist > 3

Upvotes: 2

Related Questions