HHH
HHH

Reputation: 6485

How to create a column with row number inf pyspark

I need to create a column in pyspark with has the row number of each row. I'm smonotonically_increasing_id function, but it sometimes generate very large values. How can I generate a column which has the values starting from 1 and goes to size of my dataframe?

top_seller_elast_df = top_seller_elast_df.withColumn("rank", F.monotonically_increasing_id() + 1)

Upvotes: 1

Views: 519

Answers (1)

notNull
notNull

Reputation: 31540

Use row_number() function by ordering to monotonically_increasing_id()

from pyspark.sql.functions import *
from pyspark.sql import *
w=Window.orderBy("mid")

top_seller_elast_df = top_seller_elast_df.withColumn("mid", monotonically_increasing_id())
    
top_seller_elast_df.withColumn("row_number",row_number().over(w)).show()

Upvotes: 1

Related Questions