Reputation: 333
I have a table like that:
In this table,artist_id
stands for a particular singer, Ds
is a date(from 2015 Mar 1st to the end of Apr) and like
is how many people liked this singer's songs in this particular day.
I want to get the accumulated value of like
, for example, in the day 20150303, the value will be the sum of original value of 20150301 and 20150302 and 20150303.
How can I make it?
Upvotes: 1
Views: 198
Reputation: 1416
You can use the aggregate functions provided by spark and get the output.
Your question says, based on time, but as per schema, its actually a column of date, hence you aggregate on Ds
and get sum of like
similar to
df.groupBy("Ds").sum("like")
Update:
To get the sum of all days previous to provided date, Reduce
can be used after applying filter
for the provided date to fetch results of this and previous date and then summing up all using reduce
or aggregate function sum
more details can be found here
Upvotes: 1