pyspark get the row immediately after the one selected

Question

i have a dataframe (df) like this:

col1	col2	col3
One	Two	x
One	Two	full
One	Two	y
One	Two	z
One	Two	full
One	Two	u
One	Two	e

Using PySPark i want to mark the element/rows immediately after col3=="full" with 1 otherwise 0, like this:

col1	col2	col3	flag
One	Two	x	0
One	Two	full	0
One	Two	y	1
One	Two	z	0
One	Two	full	0
One	Two	u	1
One	Two	e	0

At the moment this is my idea, but i'm not taking the row immediately after...:

df.withColumn('flag',f.when(f.col('CD_OPERAZIONE')=='full',1).otherwise(0))

can you help me?

wwnde · Accepted Answer

Use lag and when statement

w= Window.partitionBy('col1','col2').orderBy('col1')
df.withColumn('x', when(lag('col3').over(w)=='full',1).otherwise(0)).show()

+----+----+----+---+
|col1|col2|col3|  x|
+----+----+----+---+
| One| Two|   x|  0|
| One| Two|full|  0|
| One| Two|   y|  1|
| One| Two|   z|  0|
| One| Two|full|  0|
| One| Two|   u|  1|
| One| Two|   e|  0|
+----+----+----+---+

pyspark get the row immediately after the one selected

Answers (2)

Related Questions