Reputation: 1
I am working with a pyspark dataframe and trying to see if there is a method that can extract me the index of first non zero element in spark dataframe. I have added the index column myself since pyspark does not support that, as opposed to pandas.
Upvotes: 0
Views: 1240
Reputation: 15258
let's assume your dataframe looks like this :
df.show()
+---+-----+
|idx|value|
+---+-----+
| 0| 0|
| 1| 0|
| 2| 1| # <-- We want this one
| 3| 2|
| 4| 3|
| 5| 4|
+---+-----+
you can achieve this easily with a min
:
from pyspark.sql import functions as F
df.where(F.col("value") != 0).select(F.min("idx")).show()
or with a row_number
from pyspark.sql import functions as F, Window
df.where(F.col("value") != 0).withColumn(
"rwnb", F.row_number().over(Window.orderBy("idx"))
).where(F.col("rwnb") == 1).select("idx").show()
+---+
|idx|
+---+
| 2|
+---+
Upvotes: 1