FProlog
FProlog

Reputation: 173

Average ratio between two columns

I've been collecting tweets, and running some queries on them. In one of the queries I'm trying to find all users that have an an average follower to friend ratio of at least two (i.e. twice as many followers as friends). Finding the average is important because I have collected the tweets of some users multiple times, at different times. Their friends or followers can change, so they can come up with multiple ratios.

I can do it without averages, resulting in multiple results for some users

SELECT screenname,followers,friends,usertweets, followers/friends AS FFRatio 
FROM tweets
WHERE friends >0 AND followers/friends>2

When I try to use AVG, I get errors, and no results.

SELECT screenname,followers,friends,usertweets, AVG(followers/friends) AS FFRatio 
FROM tweets
WHERE friends >0 AND AVG(followers/friends)>2

Can you tell me what is wrong, please?

Upvotes: 1

Views: 2607

Answers (2)

philipxy
philipxy

Reputation: 15118

Your test for > 0 is not necessarily executed before the division. If you have these in the same condition then you can use a CASE, which executes cases in order. You could move the ratio to a HAVING, which is evaluated after WHERE.

To average over the ratios per screen name, GROUP BY screenname.

SELECT screenname, AVG(followers/friends) AS FFRatio 
FROM tweets
WHERE friends > 0
GROUP BY screenname
HAVING FFRatio > 2

(AVG is an aggregate function and must appear in HAVING or SELECT. Other columns in the GROUP BY can appear in the SELECT. If a column is not single-valued per group then there is no one value to SELECT. But if columns are single-valued per group and you want them output then you can add them to the GROUP BY & SELECT.)

BUT Probably what you really want is people who have no friends but have followers or have friends and also a good enough ratio. For a single ratio per screenname you could display a blank when there is no ratio and a number when there is. But how to average ratios in these cases?--there is not always a ratio.

BUT It is extremely unlikely that the average of a ratio or average is relevant. Eg averaging 100/4 and 99/99 gives (25+1)/2 = 13. But the average over all observations is 199/103. Similarly in your case you may have observations with followers but no friends hence with no ratio but that should be taken into account.

If you want some notion of "recent" ratio then you could use AVG(followers)/AVG(friends) where the values are from some standard time interval. Maybe your input is the activity for such an interval or the approximation of one. (You'd still be presuming a uniform distribution of observations over time.) If so:

SELECT screenname, AVG(followers) AS FoAvg, AVG(friends) AS FrAvg, 
    CASE WHEN FrAvg = 0 THEN ''
    ELSE FoAvg/FrAvg END AS FFRatio
FROM tweets
GROUP BY screenname
HAVING CASE WHEN FrAvg = 0 AND FoAvg > 0 THEN 1
   WHEN FrAvg > 0 AND FoAvg/FrAvg > 2 THEN 1
   ELSE 0 END

You have to decide just what you want your query to return. But it's extremely unlikely to be an average of ratios.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269843

You need a group by and having clause:

SELECT screenname, AVG(followers/friends) AS FFRatio 
FROM tweets
WHERE friends > 0
GROUP BY screenname
HAVING AVG(followers/friends) > 2;

I assume screenname is what you mean by "user". Don't put extraneous columns in the SELECT -- the columns should either be in the GROUP BY or arguments to aggregation functions.

Upvotes: 1

Related Questions