Generate repeating row number based on partition column in pyspark

I want to generate quarterly column shown below which is after every 4 records for each l_id number should change in pyspark. Before generating quarterly column will order data based on l_id and week columns.

enter image description here

Upvotes: 0

Views: 413

Answers (1)

Frosty
Frosty

Reputation: 698

My bad, I was thinking that there's a quaterly column already present in your dataframe but it seems you need a column that looks like quaterly. I don't think that is possible via Window function but here's a way to achieve this:

Assuming your current data is in df.

from pyspark.sql.functions import split

split_col = split(df["week"],'month')
df = df.withColumn("quaterly", (split_col.getItem(1).cast("integer")/(df["sequence_change"] + lit(1))).cast("integer") + lit(1)).orderBy("l_id","week")

Logic explanation: We are going to get the month number from week column values, cast it into an integer from string and divide it with the sequence_change value + 1 and casting final value into an integer so you can just get an integer value for it with no decimals. At last adding 1 in it so that quaterly column starts with 1 instead of 0.

Upvotes: 1

Related Questions