Reputation: 95
I have a dataFrame with Boolean records and i want restart the counting when goal=False/Null.
How can i get the Score tab ? The score tab is a count of True values with a reset on False/null values
My df:
Goals
Null
True
False
True
True
True
True
False
False
True
True
Expected Result:
Goals Score
Null 0
True 1
False 0
True 1
True 2
True 3
True 4
False 0
False 0
True 1
True 2
EDIT: Adding more infos
Actually my full dataset is:
Player Goals Date Score
1 Null 2017-08-18 10:30:00 0
1 True 2017-08-18 11:30:00 1
1 False 2017-08-18 12:30:00 0
1 True 2017-08-18 13:30:00 1
1 True 2017-08-18 14:30:00 2
1 True 2017-08-18 15:30:00 3
1 True 2017-08-18 16:30:00 4
1 False 2017-08-18 17:30:00 0
1 False 2017-08-18 18:30:00 0
1 True 2017-08-18 19:30:00 1
1 True 2017-08-18 20:30:00 2
2 False 2017-08-18 10:30:00 0
2 False 2017-08-18 11:30:00 0
2 True 2017-08-18 12:30:00 1
2 True 2017-08-18 13:30:00 2
2 False 2017-08-18 15:30:00 0
I've created a window to calculate the score by player on a certain date
val w = Window.partitionBy("Player","Goals").orderBy("date")
I've tried with the lag function and comparing the values but i can't reset the count.
EDIT2: Add unique Date per player
Thank you.
Upvotes: 0
Views: 1130
Reputation: 95
I finally solved the problem with grouping the goals that occurs together.
I used a count over a partition containing the difference between the row index of the "table" and the row_number related to the partitioned window.
First declare the window with future columns to use
val w = Window.partitionBy("player","goals","countPartition").orderBy("date")
Then populate the columns "countPartition" and "goals" with 1 to keep the rowNumber neutral
val list1= dataList.withColumn("countPartition", lit(1)).withColumn("goals", lit(1)).withColumn("index", rowNumber over w )
the udf
def div = udf((countInit: Int, countP: Int) => countInit-countP)
And finally calculate the score
val score = list1.withColumn("goals", goals).withColumn("countPartition", div(col("index") , rowNumber over w )).withColumn("Score", when(col("goals") === true, count("goals") over w ).otherwise(when(col("goals") isNull, "null").otherwise(0))).orderBy("date")
Upvotes: 1