Reputation: 2929
Table "tags":
Source Target Weight
#003 blitzkrank 0.83
#003 deutsch 0.7
#003 brammen 0.57
#003 butzfrauen 0.55
#003 solaaaa 0.5
#003 moments 0.3
college scandal 1.15
college prosecutors 0.82
college students 0.41
college usc 0.33
college full house 0.17
college friends 0.08
college house 0.5
college friend 0.01
The table has 5.600.000 rows and ~91.000 unique entries in column "Source".
For every unique value in "Source" and "Target" I need the top x% rows (e.g. top 20%, top 30%, needs to be variable) by Weight (table is sorted by "Source" (ascending) and "Weight" (descending).
Since there will be duplicates (e.g. "Source = "college" will yield at least one duplicate row as "Target" = "scandal") duplicated entries should be removed if possible. Otherwise not a big deal.
Calculation for "Source":
6 rows where Source = "#003", 6 * 0.2 = 1.2 = take 1 row
8 rows where Source = "college", 8 * 0.2 = 1.6 = take 2 rows
Needed result table for "Source":
Source Target Weight
#003 blitzkrank 0.83
college scandal 1.15
college prosecutors 0.82
How can I do that in SQL in an SQLite database?
Upvotes: 1
Views: 106
Reputation: 1270421
If you want a sample by source
:
select t.*
from (select t.*,
row_number() over (partition by source order by weight desc, target) as seqnum,
count(*) over (partition by source) as cnt
from t
) t
where seqnum = 1 or -- always at least one row
seqnum <= round(cnt * 0.2);
Based on your example, I think this is what you want. You can construct a similar query for target
.
Upvotes: 1