Reputation: 187
Chrome History
SQlite 3 database has the urls
table like this:
CREATE TABLE "urls"(
id INTEGER PRIMARY KEY AUTOINCREMENT,
url LONGVARCHAR,
title LONGVARCHAR,
visit_count INTEGER DEFAULT 0 NOT NULL,
typed_count INTEGER DEFAULT 0 NOT NULL,
last_visit_time INTEGER NOT NULL,
hidden INTEGER DEFAULT 0 NOT NULL
)
If you also use Chrome and OS X, you can find it at ~/Library/Application Support/Google/Chrome/Default/History
. Other system can see this post: Where does Chrome save its SQLite database to?
I want to count the nubmers of title
but in separated time span(like a day).
The simplified table is like below:
id | title | last_visit_time |
--------+-------+-------------------+
"240255" "foo" "13197393082183934"
"240256" "foo" "13197393082183934"
"240259" "foo" "13197393103031880"
"251615" "foo" "13201270874907168"
"251616" "foo" "13201270904026269"
"251631" "foo" "13201271376777156"
"258228" "goo" "13203150937196411"
"258229" "goo" "13203150937196411"
"258230" "goo" "13203150937196411"
The last_visit_time
is "formatted as the number of microseconds since January, 1601".
To a given span
(like 86400000000, a day in microseconds), all same title
only count 1 time, because they are in a same period.
The part of pseudocode to count for a distinct title
in Python style is like:
count = 0
all_last_visit_time_sorted = all_last_visit_time.sort() # generally it should be sorted because the `history` data was added into database in time order.
span = 86400000000 # a day in microseconds
current_base_time = all_last_visit_time_sorted[0]
for last_visit_time in all_last_visit_time_sorted:
if last_visit_time - current_base_time > span:
count += 1
current_base_time = last_visit_time
To be more clear, the partition is like:
id | title | last_visit_time |
--------+-------+-------------------+
"240255" "foo" "13197393082183934"
"240256" "foo" "13197393082183934"
"240259" "foo" "13197393103031880"
------------------------------------
"251615" "foo" "13201270874907168"
"251616" "foo" "13201270904026269"
"251631" "foo" "13201271376777156"
------------------------------------
"258228" "goo" "13203150937196411"
"258229" "goo" "13203150937196411"
"258230" "goo" "13203150937196411"
Then this is what I expect:
title | count |
-------+--------+
"foo" | 2 |
"goo" | 1 |
I don't know how to do these with SQL although after google such keywords as window function
,subquery
and so on.
p.s: data sample
INSERT INTO urls VALUES ("460534","https://stackoverflow.com/questions/ask","Ask a Question - Stack Overflow","1","0","13197393082183934","0");
INSERT INTO urls VALUES ("460535","https://stackoverflow.com/questions/ask/wizard","Ask a Question - Stack Overflow","1","0","13197393082183965","0");
INSERT INTO urls VALUES ("460536","https://stackoverflow.com/questions/ask?guided=true","Ask a Question - Stack Overflow","1","0","13197393082184036","0");
INSERT INTO urls VALUES ("460537","https://stackoverflow.com/search?q=%5Bhadoop%5D","Highest Voted 'hadoop' Questions - Stack Overflow","1","0","13196783835451652","0");
INSERT INTO urls VALUES ("460538","https://stackoverflow.com/questions/tagged/hadoop","Highest Voted 'hadoop' Questions - Stack Overflow","1","0","13196783835452653","0");
INSERT INTO urls VALUES ("460539","https://stackoverflow.com/questions/ask","Ask a Question - Stack Overflow","1","0","13407393082183934","0");
INSERT INTO urls VALUES ("460540","https://stackoverflow.com/questions/ask/wizard","Ask a Question - Stack Overflow","1","0","13407393082184035","0");
INSERT INTO urls VALUES ("460541","https://stackoverflow.com/questions/ask?guided=true","Ask a Question - Stack Overflow","1","0","13407393082184236","0");
Upvotes: 0
Views: 157
Reputation: 8758
To me, this is just another flavor of Gaps and Islands. You have to compare each row's last_visit_time to the previous row's. If they are within a day of each other (86400000000), then you want to count them as 1 visit. We'll derive a column to indicate if the current row is within 1 day of the previous row.
Next, you need to take that newly derived column (gap
in the dbfiddle), and do a running sum on it. Each time the gap goes to 1, you'll see that running sum go up. So all the rows where the running sum are the same, you'll count them together.
Finally, you can do a count distinct on that grouper column, and you'll get your output.
Here's a DBFiddle (with a few more rows added to your first set of sample data to get a couple more days in there), broken down step by step.
Here's the final (ugly) query:
with fst as (
select
id, title,
last_Visit_time,
lag(last_visit_time,1,last_visit_time) over (partition by title order by last_visit_time) ,
case
when last_visit_time - lag(last_visit_time,1,last_visit_time) over (partition by title order by last_visit_time) < 86400000000
then 0
else 1
end as gap
from
<your table>
order by last_visit_time)
select
title,
count (distinct grouper)
from (
select
id,
title,
gap,
sum(gap) over (partition by title order by last_visit_time rows unbounded preceding) as grouper
from
fst) t
Upvotes: 1