Outback
Outback

Reputation: 542

Relational algebra to count rows

I wasn't sure quite what to call this problem but it's not exactly counting rows. Let's say we have the relation:

Competition(compId, sport, playerName, medal)

And let's say the attribute medal can be either gold, silver, bronze, or null. So we have the following data:

(193, Tennis, John Doe, Gold)
(931, Skiing, Mary White, Bronze)
(193, Tennis, Arnold Black, null)
(182, Bobsledding, John Doe, Gold)
(901, Ping-Pong, Adam Brown, Silver)
(248, Bobsledding, Mary White, Silver)

I am having a very hard time figuring out how to answer this question: Get the names of all players who have won more than one medal. In this data the answers would be John Doe and Mary White. How could I get that answer on arbitrary data for this relation using relational algebra?

(This is a simplified version of the actual homework problem, and this simplification represents (I hope) the part of that problem I'm struggling with. There are an arbitrary and unknown number of competitions, sports, and players, but only 4 possibilities for medal)

Upvotes: 7

Views: 37735

Answers (4)

Martin
Martin

Reputation: 15

I have a similar problem. I resolved thinking if I match every person with himself, and then filter only when the attribute is different, I get the people with more that one value for that attribute, in this case, medal.

Player1 = π compId,playerName,medal (ρ Player1 (Competition))
Player2 = π compId,playerName,medal (ρ Player2 (Competition))
π Player1.playerName (σ Player1.medal<>Player.medal (Player1 ⨝ Player1.compId=Player2.compId Player2))

Upvotes: 0

jairhumberto
jairhumberto

Reputation: 614

I know this is a very old post, but I'm learning this subject, and I did find another way to represent what the exercise asks. Maybe it can be usefull for someone searching for this.

The answer:

π Lc.playerName (
    ρ Lc σ medal ≠ 'null' Competition
            ⨝ Lc.playerName = Rc.playerName AND Lc.sport ≠ Rc.sport
    ρ Rc σ medal ≠ 'null' Competition)

I'm assuming that nobody can have two medals in the same sport, if somone have more than one medal, it has to be in different sports. I'm also assuming that each tuple in the relation represents an awarding, so ...

The expression above manages to get the playerName that appears more than once for different sports.


Edited: I treated null... you can see it in action with this tool https://dbis-uibk.github.io/relax/calc.htm

to set the data, use this this code in the Group Editor tab

group:Competition

Competition = {
    compId:number, sport:string , playerName:string, medal:string
    193          , 'Tennis'     , 'John Doe'       , 'Gold'      
    931          , 'Skiing'     , 'Mary White'     , 'Bronze'    
    193          , 'Tennis'     , 'Arnold Black'   , 'null'      
    182          , 'Bobsledding', 'John Doe'       , 'Gold'      
    901          , 'Ping-Pong'  , 'Adam Brown'     , 'Silver'    
    248          , 'Bobsledding', 'Mary White'     , 'Silver'    
}

Upvotes: 0

philipxy
philipxy

Reputation: 15118

Get the names of all players who have won more than one medal.

(It's not clear what this means. Have won than one kind of medal? Or have received more than one medal? Your example answer suggests the latter. Also, it treats "null" as just another kind of medal, not specially as in SQL.)

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        in competition [compId] of sport [sport] player [playerName] won [medal]
    AND in competition [compId2] of sport [sport2] player [playerName] won [medal2]
    AND (compId <> compId2 OR sport <> sport2 OR medal <> medal2)

Using statement shorthand:

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        Competition(compId, sport, playerName, medal)
    AND Competition(compId2, sport2, playerName, medal2)
    AND (compId <> compId2 OR sport <> sport2 OR medal <> medal2)

Rearranging (anticipating the limitations of one comparison per σ and one attribute set per ∪):

-- rows where
    THERE EXISTS compId,sport,medal,compId1,compId2,medal2 SUCH THAT
        (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND compId <> compId2)
    OR (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND sport <> sport2)
    OR (   Competition(compId, sport, playerName, medal)
        AND Competition(compId2, sport2, playerName, medal2)
        AND medal <> medal2)

Now to get the algebra replace:

  • every statement by its table/relation
  • every AND of table/relation by ⋈ (natural join)
  • every OR of table/relation (which must have the same columns/attributes) by ∪ (union)
  • every AND NOT (which must have the same columns/attributes) by \ (difference)
  • every AND comparison by σ comparison (select/restrict)
  • every EXISTS names to drop by π names to keep (projection)
  • every column/attribute renaming by ρ (rename).

    π playerName (
        σ compId <> compId2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    ∪   σ sport <> sport2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    ∪   σ medal <> medal2 (Competition
            ⋈ ρ compID2/compID ρ sport2/sport ρ medal2/medal Competition)
    )
    

(For more see this answer.)

Upvotes: 2

James Williams
James Williams

Reputation: 29

There is a much simpler way to solve this problem, in my opinion:

enter image description here

Essentially, you find the relation of records where medal isn't null, and then join this record with itself, joining on name. The resulting records will be the ones where there are duplicate names.

Upvotes: 0

Related Questions