Find preceding row / entry under specific conditions in SQL / Redshift

Question

I am trying to find the preceding row to a specific occurrence in a database, or rather some data from it.

In this example I would like to find the movement_method of the preceding row (sorted by timestamp) before a user visited the pub. So in tom's example I would like to know that tom went home by car before visiting the pub. (it doesn't matter how he traveled to the pub but rather the used method before going to the pub)

I have an example database with: user, location, movement_method, timestamp:

user	location	movement_method	timestamp
tom	work	car	2022-03-02 14:30
tom	home	car	2022-03-02 20:30
tom	pub	bus	2022-03-02 22:30
tom	home	foot	2022-03-03 02:30
jane	school	bus	2022-03-02 08:30
jane	home	bus	2022-03-02 14:30
jane	pub	foot	2022-03-02 21:30
jane	home	bus	2022-03-02 23:30
lila	work	bus	2022-03-02 08:30
lila	home	bus	2022-03-02 16:30
jake	friend	car	2022-03-02 15:30
jake	home	bus	2022-03-02 20:30
jake	pub	car	2022-03-02 20:30
jake	home	car	2022-03-03 02:30

For this database the result I would want would be: | user | preceding_movement_method | | ---- | ------- | | tom | car | | jane | bus | | jake | bus |

lila is not being reported because she never visited the pub
I only need to know the preceding movement_method before visiting the pub (sorted by time)
the movement_method which was used to go to the pub is not relevant

My current approach is to have a partition or window function for "preceding_movement_method" but I'm stuck finding the "preceding" entry before the one which fits the where statement.

So I'm looking for something like this pseudocode:

select user,
 (select preceding movement_method 
  from movement_database 
  where location = 'pub'
  order by timestamp) as preceding_movement_method
from movement_database

Bill Weiner · Accepted Answer

The LAG() window function is where I would go with this. I set the (sqlfiddle) data up as:

create table movements (
  uname varchar(16),
  location  varchar(16),
  movement_method   varchar(16),
  ts timestamp
);
 
insert into movements values
('tom', 'work', 'car', '2022-03-02 14:30'),
('tom', 'home', 'car', '2022-03-02 20:30'),
('tom', 'pub', 'bus', '2022-03-02 22:30'),
('tom', 'home', 'foot', '2022-03-03 02:30'),
('jane', 'school', 'bus', '2022-03-02 08:30'),
('jane', 'home', 'bus', '2022-03-02 14:30'),
('jane', 'pub', 'foot', '2022-03-02 21:30'),
('jane', 'home', 'bus', '2022-03-02 23:30'),
('lila', 'work', 'bus', '2022-03-02 08:30'),
('lila', 'home', 'bus', '2022-03-02 16:30'),
('jake', 'friend', 'car', '2022-03-02 15:30'),
('jake', 'home', 'bus', '2022-03-02 20:30'),
('jake', 'pub', 'car', '2022-03-02 20:30'),
('jake', 'home', 'car', '2022-03-03 02:30');

And the SQL as:

select uname, pmove 
from (
  select uname, location,
    lag (movement_method) over (partition by uname order by ts) as pmove
  from movements) as subq
where location = 'pub';

Now many of the timestamps for Jake are all the same so there is some uncertainty there.

I'd stay away from cross joins / loop joins since you are in Redshift and this implies very large datasets and these processes can explode with such large data.

Find preceding row / entry under specific conditions in SQL / Redshift

Answers (2)

Related Questions