PySpark Data Frame - give an ID to sequence of same values

Question

I have a dataset in a pyspark job that looks a bit like this:

frame_id    direction_change  
1           False  
2           False  
3           False  
4           True  
5           False

I want to add a "track" counter to each row so that all the frames between direction changes have the same value. For example, the output I want looks like this:

frame_id    direction_change    track
1           False               1
2           False               1
3           False               1
4           True                2
5           False               2

I have been able to do this with Pandas with the following action:

frames['track'] = frames['direction_change'].cumsum()

But can't find an equivalent way to do it in Spark data frames. Any help would be really appreciated.

PySpark Data Frame - give an ID to sequence of same values

Answers (1)

Related Questions