Reputation: 133
I want to generate quarterly column shown below which is after every 4 records for each l_id number should change in pyspark. Before generating quarterly column will order data based on l_id and week columns.
Upvotes: 0
Views: 413
Reputation: 698
My bad, I was thinking that there's a quaterly column already present in your dataframe but it seems you need a column that looks like quaterly. I don't think that is possible via Window function but here's a way to achieve this:
Assuming your current data is in df.
from pyspark.sql.functions import split
split_col = split(df["week"],'month')
df = df.withColumn("quaterly", (split_col.getItem(1).cast("integer")/(df["sequence_change"] + lit(1))).cast("integer") + lit(1)).orderBy("l_id","week")
Logic explanation:
We are going to get the month number from week
column values, cast it into an integer from string and divide it with the sequence_change value + 1
and casting final value into an integer so you can just get an integer value for it with no decimals. At last adding 1 in it so that quaterly
column starts with 1 instead of 0.
Upvotes: 1