Reputation: 2420
Background I am storing user's wikipedia browsing history in SQLite db in an Android app with a view to maintaining and presenting their browsing history as a graph.
This is the structure of the graph - every time the user clicks on link in wikipage (to another wikipage) a WikiPage
(if not already present) &PageVisit
is created for new page. A VisitEdge
back to the previous WikiPage
via its PageVisit
is also added (clear as mud?)
Writing all this to my database all works fine.
I am trying to write two SQL statements, INLINKS and OUTLINKS so if the user is on such-and-such a WikiPage
, the queries return the out links and inlinks to that page respectively. The rows returned would be in chronological order (newest first), not show repeats and show alongside each entry a count to the number of times that page has been visited historically:
e.g. The user is on B so the query would return
INLINKS : (row1) A._id, 2._id, 2.wikipage_id, 2.created_at, COUNT = 1
OUTLINKS : (row1) C._id, 6._id, 6.wikipage_id, 6.created_at, COUNT = 2
(row2) D._id, 5._id, 5.wikipage_id, 5.created_at, COUNT = 1
e.g. If user is at C
INLINKS : (row1) B._id, 3._id, 3.wikipage_id, 3.created_at, COUNT = 1
(row2) A._id, 1._id, 1.wikipage_id, 1.created_at, COUNT = 1
OUTLINKS: (no rows)
NOTE: I am just returning all columns at the moment...first-things-first!
This is the current OUTLINK statement:
private static final String SELECT_OUTLINKS_QUERY = "SELECT DISTINCT wp.*,ve.*,pv.* FROM " + TABLE_NAME_PAGEVISITS + " pv "
+ " JOIN " + TABLE_NAME_VISITEDGES + " ve ON pva." + PageVisit.KEY_ID + " = ve." + VisitEdge.KEY_FROM_PAGEVISIT_ID
+ " JOIN " + TABLE_NAME_PAGEVISITS + " pva ON pva." + PageVisit.KEY_WIKIPAGE_ID + " = ve." + VisitEdge.KEY_TO_PAGEVISIT_ID
+ " JOIN " + TABLE_NAME_WIKIPAGES + " wp ON wp." + WikiPage.KEY_ID + " = pv." + PageVisit.KEY_WIKIPAGE_ID
+ " WHERE pv." + PageVisit.KEY_WIKIPAGE_ID + " =?" // ? is the current wikiPage id!
+ " ORDER BY pva." + PageVisit.KEY_CREATED_AT + " DESC";
Needless to say I am pretty green when it comes to SQL and this is not working so well - it returns something every once in a while but rarely the right thing, and most of the time nothing. I have experimented with putting a COUNT
statement in too but have had even less luck with this.
The INLINKS statement is similar except 'to' is swapped with 'from'.
Is anyone able to help me - I am getting into a bit of a mess with it. For instance is it right to be defining two PageVisit tables, pv
and pva
? How would I get the count in there?
An Answer...not the one I used Thanks to @CL. for his answer below - I actually ended up using his solution, but after asking qn I tried to solve on my lonesome. This is what i ended up. I have included it for completeness. I would be interested in pros-and-cons of both approaches.
SELECT DISTINCT wp.*,c.n,mtable.mx
FROM PageVisit pv
JOIN PageVisit pv1 ON pv1._id = ve.topagevisit_id
JOIN VisitEdge ve ON pv._id = ve.frompagevisit_id
JOIN WikiPage wp ON wp._id = pv1.wikipage_id
JOIN (SELECT WikiPage_id,MAX(created_at) AS mx FROM PageVisit GROUP BY wikipage_id) mtable ON wp._id = mtable.wikipage_id
JOIN (SELECT WikiPage_id,COUNT(*) AS n FROM PageVisit GROUP BY wikipage_id) c ON wp._id = c.wikipage_id
WHERE pv.wikipage_id = ?
ORDER BY mtable.mx DESC;
Upvotes: 1
Views: 235
Reputation: 180200
First, the interesting page visits are:
SELECT *
FROM PageVisit
WHERE wikipage_id = ?
For OUTLINKS, the corresponding outgoing visit edges can be looked up:
SELECT *
FROM VisitEdge
WHERE from_pagevisit_id IN (SELECT _id
FROM PageVisit
WHERE wikipage_id = ?)
Now this can be used to look up the destination page visits:
SELECT *
FROM PageVisit
WHERE _id IN (SELECT to_pagevisit_id
FROM VisitEdge
WHERE from_pagevisit_id IN (SELECT _id
FROM PageVisit
WHERE wikipage_id = ?))
The corresponding wiki page data can be added with a join:
SELECT *
FROM WikiPage
JOIN PageVisit ON WikiPage._id = PageVisit.wikipage_id
WHERE PageVisit._id IN (SELECT to_pagevisit_id
FROM VisitEdge
WHERE from_pagevisit_id IN (SELECT _id
FROM PageVisit
WHERE wikipage_id = ?))
To remove page duplicates, group by the page.
There can be multiple visit records for one output record, so it is no longer possible to output PageVisit
values directly; they must be handled with some aggregate function:
SELECT WikiPage.*,
MAX(PageVisit.created_at) AS last_date,
COUNT(*) AS "count"
FROM WikiPage
JOIN PageVisit ON WikiPage._id = PageVisit.wikipage_id
WHERE PageVisit._id IN (SELECT to_pagevisit_id
FROM VisitEdge
WHERE from_pagevisit_id IN (SELECT _id
FROM PageVisit
WHERE wikipage_id = ?))
GROUP BY WikiPage._id
ORDER BY last_date DESC
Upvotes: 1