Reputation: 153
Today, I need your help.
I have a stats website, I get data from Game Webservices. I want to implement a new function but I don't know how.
I want to guess players' connection hours.
I have a script which collects data every hour and stores this data in a table. Imagine that I have a table with: player_id, score and the hour (Integer, just H), and the day number of the month.
Then, for example, if the score between hour 17 and 18 is different then player has been connected to his account.
To simplify, imagine that I have a table with day from 1 to 31 and hour from 0 to 23 for every day.
At the end of the month I need to execute a query to calculate for each hour, the number of days the player has been connected during this hour.
Example :
0 => 31 The player has been connected between 23 and 0 : every days
1 => 3 The player has been connected between 0 and 1 : 3 days a month
2 => 5 The player has been connected between 1 and 2 : 5 days a month
3 => 10 The player has been connected between 3 and 4 : 10 days a month
...
23 => 4
I think I can ORDER BY days and hour and player_id from day 1 hour 0 to day 31 hour 23 And do a first SELECT with a CASE like :
SELECT
table.*,
(CASE WHEN ACTUAL_ROW.score!=PREVIOUS_ROW.score THEN 1 ELSE 0) AS active
FROM table
TO know for each row if the player has been connected. AND THEN It's Simple to do a GROUP BY and a SUM for each hour. But I don't know how I can compare previous row with actual
Do you have any IDEA or hint how to do this ? Is PL/SQL Better to do this ?
Note :I'm using PostGreSQL
Thanks
Upvotes: 2
Views: 158
Reputation: 57418
The problem here is that we're not checking when the player "was connected" but instead when the player "earned points", which can be similar - or not; and this at intervals of one hour (three logins in one hour count as one). Just as well, a player remaining logged three hours and accruing points in that period will result as being "logged" in one, two or three data points, depending.
With those caveats, we can JOIN the score table with itself:
SELECT a.player_id, a.day, a.hour, a.score - b.score AS chg
FROM cdata AS a
JOIN cdata AS b
ON (
(a.player_id = b.player_id AND a.score != b.score)
AND (
(a.hour > 0 AND a.day = b.day AND b.hour = a.hour-1)
OR
(a.hour = 0 AND a.day = b.day+1 AND b.hour = 23)
)
)
This will yield a series of statistics for the user, with the day and hour when his score changed.
You can use this in a collecting subSELECT
SELECT player_id, hour, COUNT(player_id) FROM ( ... ) AS changes
GROUP BY player_id, hour
ORDER BY player_id, hour;
and this will return in 'changes' a number between 1 and 31. Hours with no logins will not be counted.
I have attempted to provide a test case with this SQLFiddle. The above is not PostgreSQL specific, you can optimize the inner query using PostgreSQL window functions.
Upvotes: 0
Reputation: 28591
You can access the previous row of the table with LAG
window function.
Try using something like
SELECT player_id, count(CASE WHEN score > prev_score THEN 1 END)
FROM(
SELECT player_id, score, mm, hh, LAG(score) OVER (ORDER BY mm,hh) as prev_score
FROM your_table)
GROUP BY player_id
Additional advise - store full timestamps instead of day and hour fields. You can always get the day and hour from timestamp with functions.
Manual on window functions: one
, two
Upvotes: 3