idf4floor
idf4floor

Reputation: 73

Get most frequent value in column using Sqlite in R

I have 2 tables , the first one is called Athletes and has columns that include :

athlete_id,first_name,last_name,country_id

The second table is called medals and

has the columns:

year,sport,event,athlete_id,medal

This is what a part of Athletes looks like:

 athlete_id first_name last_name country_id
1 BURKETOM01        Tom     Burke        USA
2 HOFMAFRI01      Fritz   Hofmann        DEU
3  LANEFRA01    Francis      Lane        USA
4 SZOKOALA01     Alajos  Szokolyi        HUN
5 BURKETOM01        Tom     Burke        USA
6 JAMISHER01    Herbert   Jamison        USA

This is what a part of Medals looks like:

year         sport    event athlete_id  medal

1 1896 Track & Field 100m Men BURKETOM01   GOLD
2 1896 Track & Field 100m Men HOFMAFRI01 SILVER
3 1896 Track & Field 100m Men  LANEFRA01 BRONZE
4 1896 Track & Field 100m Men SZOKOALA01 BRONZE
5 1896 Track & Field 400m Men BURKETOM01   GOLD
6 1896 Track & Field 400m Men JAMISHER01 SILVER

I need to find the athlete with the highest number of medals and the number of medals he has won. The correct answer is

 full_name top_no_medals
1 Larisa Latynina 18

I saw a few posts similar to this and tried using what was suggested there This is my code:

dbGetQuery(olympics.db,statement = 
             'SELECT Athletes.first_name ||" "|| Athletes.last_name AS full_name,COUNT(Medals.athlete_id) 
           AS top_no_medals
           FROM Athletes
           JOIN  Medals ON Medals.athlete_id=Athletes.athlete_id
           GROUP BY full_name
           ORDER BY COUNT(*) DESC
           LIMIT 1'
)

what I am trying to do is combine the 2 tables according to the variable athlete_id and the count this variable. For some reason the answer I get is with too many medals.

          full_name top_no_medals
1   Larisa Latynina           324

It seems there is a mistake couting the medal. Now I am sure the data is correct, when I check Medals table to see What is the most common athlete_id I get the correct one. This is the code and answer:

dbGetQuery(olympics.db,statement = ' Select athlete_id, count(*) AS top_no_medals
 From Medals
Group By athlete_id
ORDER BY COUNT(*) DESC
LIMIT 1')

 athlete_id top_no_medals
1 LATYNLAR01            18

This athlete_id is the one associated with Larisa Latynina so its not a problem in the medals table.

Upvotes: 0

Views: 805

Answers (2)

Philippe Marchand
Philippe Marchand

Reputation: 676

Given that the second query (get medal count) works, I would suggest computing it first then joining with Athletes to get name matching that ID:

SELECT a.first_name || ' ' || a.last_name AS full_name, top_no_medals
FROM (SELECT athlete_id, count(*) AS top_no_medals
      FROM Medals
      GROUP BY athlete_id
      ORDER BY COUNT(*) DESC
      LIMIT 1) m
JOIN Athletes a ON m.athlete_id = a.athlete_id

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270713

Based on the fact that your second query works, the problem would appear to be the column athletes(athlete_id). This seems to be duplicated.

Try running this query:

select athlete_id, count(*) as cnt
from athletes
group by athlete_id
order by count(*) desc;

The maximum cnt should be 1. If not, you have a problem with your data.

Oh, never mind with that. Your Atheletes table does have duplicates. Assuming the name is always the same, you can do:

SELECT a.full_name, COUNT(*) AS top_no_medals
FROM (SELECT a.athlete_id,
             MAX(a.first_name || ' ' || a.last_name) as full_name
      FROM Athletes a 
      GROUP BY athlete_id
     ) a
     Medals m
     ON m.athlete_id = a.athlete_id
GROUP BY full_name
ORDER BY COUNT(*) DESC
LIMIT 1;

You could also revise your query to use COUNT(DISTINCT), assuming that each medal has a unique identifier:

SELECT a.first_name || ' ' || a.last_name AS full_name,
       COUNT(DISTINCT m.medal_id) as top_no_medals
FROM Athletes a JOIN
     Medals m
     ON m.athlete_id = a.athlete_id
GROUP BY full_name
ORDER BY COUNT(DISTINCT m.medal_id) DESC
LIMIT 1;

Upvotes: 2

Related Questions