dedpo
dedpo

Reputation: 502

HIVE multiple group by and minus operation

I am trying to join both of these select statements on id and playerID and year and yearID(schema below). Also subtract Aliases HAB - EG Also group by year and id in both select statements to SUM up the values prior to performing division and subtraction in later in hierarchy. It says to group by G to when i try this, seems weird. I dont need to group by G, just id and year because a player can have multiple entries in the table and we need to sum up the G,E H and AB prior to calculations

  Try this:

SELECT
    a.playerID AS ID,
    a.yearID AS yearID,
    (b.HAB - a.EG) AS `HAB-EG`
FROM 
    (SELECT
        SUM(playerID),
        SUM(yearID),
        (E/G) AS EG
    FROM fielding
    WHERE (
            yearID > 2005
            AND yearID < 2009
            AND G > 20 
            )GROUP BY playerID,yearID
    ) AS a
JOIN
    (SELECT
        SUM(id),
        SUM(year),
        (hits/ab) AS HAB
    FROM batting
    WHERE( 
            year > 2005
            AND year < 2009 
            AND ab > 40 
            ) GROUP BY id,year

    ) AS b ON a.playerID = b.id AND a.yearID = b.year;

JUST SCHEMA

CREATE EXTERNAL TABLE IF NOT EXISTS fielding
(playerID STRING ,yearID INT ,teamID STRING ,lgID STRING ,
POS STRING ,G INT ,GS INT , InnOuts INT , PO INT,A INT, E INT,  
DP INT , PB INT , WP INT ,SB INT ,CS INT , ZR INT ) ROW
FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION      '/home/hduser/hivetest/fielding';

JUST THE SCHEMA

 CREATE EXTERNAL TABLE IF NOT EXISTS batting(id STRING, year INT, team STRING,
 league STRING, games INT, ab INT, runs INT, hits INT, doubles INT, triples
 INT, homeruns INT, rbi INT, sb INT, cs INT, walks INT, strikeouts INT, ibb
 INT, hbp INT, sh INT, sf INT, gidp INT) ROW FORMAT DELIMITED FIELDS
 TERMINATED BY ',' LOCATION '/home/hduser/hivetest/batting';

Upvotes: 0

Views: 1248

Answers (1)

ssn
ssn

Reputation: 509

Try this:

SELECT
    a.playerID AS ID,
    a.yearID AS yearID,
    (b.HAB - a.EG) AS `HAB-EG`
FROM 
    (SELECT
        playerID,
        yearID,
        (SUM(E)/SUM(G)) AS EG
    FROM fielding
    WHERE (
            yearID > 2005
            AND yearID < 2009
            AND G > 20 
            )GROUP BY playerID,yearID
    ) AS a
JOIN
    (SELECT
        id,
        year,
        (SUM(hits)/SUM(ab)) AS HAB
    FROM batting
    WHERE( 
            year > 2005
            AND year < 2009 
            AND ab > 40 
            ) GROUP BY id,year

    ) AS b ON a.playerID = b.id AND a.yearID = b.year;

Upvotes: 0

Related Questions