Liu Chong
Liu Chong

Reputation: 333

How Can I Get Accumulated Value Based on Time, Using pyspark SQL?

I have a table like that:

enter image description here

In this table,artist_id stands for a particular singer, Ds is a date(from 2015 Mar 1st to the end of Apr) and like is how many people liked this singer's songs in this particular day. I want to get the accumulated value of like, for example, in the day 20150303, the value will be the sum of original value of 20150301 and 20150302 and 20150303. How can I make it?

Upvotes: 1

Views: 198

Answers (1)

vinay
vinay

Reputation: 1416

You can use the aggregate functions provided by spark and get the output.

Your question says, based on time, but as per schema, its actually a column of date, hence you aggregate on Ds and get sum of like similar to

df.groupBy("Ds").sum("like")

Update: To get the sum of all days previous to provided date, Reduce can be used after applying filter for the provided date to fetch results of this and previous date and then summing up all using reduce or aggregate function sum

more details can be found here

Upvotes: 1

Related Questions