George Rodman
George Rodman

Reputation: 349

Parse JSON in Postgres

I have the following JSON stored in a JSONB column called 'data' within a Postgres table:

{"resource":"boxscore","parameters":{"GameID":"0021700079","StartPeriod":0,"EndPeriod":0,"StartRange":0,"EndRange":0,"RangeType":0},"resultSets":[{"name":"PlayerStats","headers":["GAME_ID","TEAM_ID","TEAM_ABBREVIATION","TEAM_CITY","PLAYER_ID","PLAYER_NAME","START_POSITION","COMMENT","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"OKC","Oklahoma City",202331,"Paul George","F","","28:43",8,15,0.533,4,5,0.800,0,0,0.000,1,4,5,0,4,0,3,2,20,20.000],["0021700079",1610612760,"OKC","Oklahoma City",2546,"Carmelo Anthony","F","","27:41",7,16,0.438,5,9,0.556,2,2,1.000,0,5,5,2,0,0,0,2,21,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203500,"Steven Adams","C","","28:11",6,10,0.600,0,0,0.000,0,0,0.000,5,3,8,2,1,1,1,0,12,23.000],["0021700079",1610612760,"OKC","Oklahoma City",203460,"Andre Roberson","G","","20:22",2,2,1.000,1,1,1.000,0,0,0.000,1,0,1,1,0,1,0,2,5,21.000],["0021700079",1610612760,"OKC","Oklahoma City",201566,"Russell Westbrook","G","","28:25",5,13,0.385,0,2,0.000,2,3,0.667,2,11,13,13,2,0,2,2,12,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203924,"Jerami Grant","","","23:00",2,4,0.500,0,1,0.000,2,3,0.667,0,4,4,1,3,2,1,2,6,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203518,"Alex Abrines","","","18:09",2,8,0.250,1,5,0.200,0,0,0.000,0,2,2,1,2,0,1,2,5,-1.000],["0021700079",1610612760,"OKC","Oklahoma City",101109,"Raymond Felton","","","20:24",3,8,0.375,2,4,0.500,4,6,0.667,1,7,8,3,2,1,1,1,12,7.000],["0021700079",1610612760,"OKC","Oklahoma City",202335,"Patrick Patterson","","","15:01",1,5,0.200,0,4,0.000,0,0,0.000,1,1,2,0,2,0,1,0,2,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203962,"Josh Huestis","","","17:02",1,5,0.200,0,3,0.000,0,0,0.000,0,2,2,1,0,1,1,2,2,13.000],["0021700079",1610612760,"OKC","Oklahoma City",1628390,"Terrance Ferguson","","","4:48",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,0,0,0,0,2,2,7.000],["0021700079",1610612760,"OKC","Oklahoma City",2555,"Nick Collison","","","4:48",0,0,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,7.000],["0021700079",1610612760,"OKC","Oklahoma City",1626177,"Dakari Johnson","","","3:26",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,3,0,2,0,0,2,7.000],["0021700079",1610612741,"CHI","Chicago",1627835,"Paul Zipser","F","","17:05",1,5,0.200,1,3,0.333,0,0,0.000,1,2,3,0,1,0,3,1,3,-18.000],["0021700079",1610612741,"CHI","Chicago",1628374,"Lauri Markkanen","F","","27:54",3,7,0.429,3,6,0.500,6,6,1.000,3,5,8,1,0,1,0,1,15,-17.000],["0021700079",1610612741,"CHI","Chicago",201577,"Robin Lopez","C","","25:22",4,9,0.444,0,0,0.000,2,2,1.000,1,2,3,2,0,2,1,1,10,-13.000],["0021700079",1610612741,"CHI","Chicago",203200,"Justin Holiday","G","","31:39",4,16,0.250,3,10,0.300,0,1,0.000,2,3,5,2,1,0,4,1,11,-18.000],["0021700079",1610612741,"CHI","Chicago",1626170,"Jerian Grant","G","","21:00",0,7,0.000,0,6,0.000,2,3,0.667,1,1,2,4,1,0,0,1,2,-17.000],["0021700079",1610612741,"CHI","Chicago",1627756,"Denzel Valentine","","","30:55",3,12,0.250,2,5,0.400,0,0,0.000,3,6,9,3,1,0,3,2,8,-14.000],["0021700079",1610612741,"CHI","Chicago",202347,"Quincy Pondexter","","","20:06",1,7,0.143,1,7,0.143,3,4,0.750,0,3,3,0,0,0,3,3,6,-15.000],["0021700079",1610612741,"CHI","Chicago",1627739,"Kris Dunn","","","22:12",4,9,0.444,0,0,0.000,0,0,0.000,1,3,4,3,3,1,4,5,8,-8.000],["0021700079",1610612741,"CHI","Chicago",1626245,"Cristiano Felicio","","","22:38",1,2,0.500,0,0,0.000,0,0,0.000,0,8,8,0,0,1,1,0,2,-19.000],["0021700079",1610612741,"CHI","Chicago",1628021,"David Nwaba","","","16:21",1,3,0.333,0,0,0.000,2,4,0.500,0,2,2,0,2,0,0,1,4,-14.000],["0021700079",1610612741,"CHI","Chicago",1627770,"Kay Felder","","","4:48",0,1,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,-7.000]]},{"name":"TeamStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","240:00",39,88,0.443,13,34,0.382,10,14,0.714,11,41,52,28,16,8,11,17,101,32.000000],["0021700079",1610612741,"Bulls","CHI","Chicago","240:00",22,78,0.282,10,37,0.270,15,20,0.750,12,35,47,16,9,5,19,16,69,-32.000000]]},{"name":"TeamStarterBenchStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","STARTERS_BENCH","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Starters","133:22",28,56,0.500,10,17,0.588,4,5,0.800,9,23,32,18,7,2,6,8,70],["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Bench","106:38",11,32,0.344,3,17,0.176,6,9,0.667,2,18,20,10,9,6,5,9,31],["0021700079",1610612741,"Bulls","CHI","Chicago","Starters","123:00",12,44,0.273,7,25,0.280,10,12,0.833,8,13,21,9,3,3,8,5,41],["0021700079",1610612741,"Bulls","CHI","Chicago","Bench","117:00",10,34,0.294,3,12,0.250,5,8,0.625,4,22,26,7,6,2,11,11,28]]}]}

I need to parse it to pull out the box score statistics for both the individual players and teams, and subsequently put them in another table where it's one row per player, and all the different statistics are placed in columns. For example, the final table should look like:

Player           | PTS | REB | AST | etc....
Paul George      | 20  |  5  |  0  | etc....
Carmelo Anthony  | 21  |  5  |  2  | etc....

However, it looks as if the values are stored in a list. How do I utilize the JSONB functions to parse these data points out? I've tried looking at the official Postgres documentation, but can't find a way to specifically pull this data out in the fashion I need it.

I inserted the data into the Postgres database via Python. So if it's easier/wiser to parse the data within Python first and transfer it into Postgres, please let me know.

Upvotes: 3

Views: 3612

Answers (1)

Dan
Dan

Reputation: 2705

Parsing all of this out manually will be tedious, but it's the most obvious solution:

SELECT
  rowSet -> 0 AS "GAME_ID",
  rowSet -> 1 AS "TEAM_ID",
  rowSet -> 2 AS "TEAM_ABBREVIATION",
  rowSet -> 3 AS "TEAM_CITY",
  rowSet -> 4 AS "PLAYER_ID",
  rowSet -> 5 AS "PLAYER_NAME",
  rowSet -> 6 AS "START_POSITION",
  rowSet -> 7 AS "COMMENT",
  rowSet -> 8 AS "MIN",
  rowSet -> 9 AS "FGM",
  rowSet -> 10 AS "FGA",
  rowSet -> 11 AS "FG_PCT",
  rowSet -> 12 AS "FG3M",
  rowSet -> 13 AS "FG3A",
  rowSet -> 14 AS "FG3_PCT",
  rowSet -> 15 AS "FTM",
  rowSet -> 16 AS "FTA",
  rowSet -> 17 AS "FT_PCT",
  rowSet -> 18 AS "OREB",
  rowSet -> 19 AS "DREB",
  rowSet -> 20 AS "REB",
  rowSet -> 21 AS "AST",
  rowSet -> 22 AS "STL",
  rowSet -> 23 AS "BLK",
  rowSet -> 24 AS "TO",
  rowSet -> 25 AS "PF",
  rowSet -> 26 AS "PTS",
  rowSet -> 27 AS "PLUS_MINUS"
  FROM
  (
    SELECT
      JSONB_ARRAY_ELEMENTS(resultSets -> 'rowSet') AS rowSet
    FROM
      (
        SELECT
          JSONB_ARRAY_ELEMENTS(data -> 'resultSets') AS resultSets -- Get individual players
          FROM
          (
          SELECT
            '{"resource":"boxscore","parameters":{"GameID":"0021700079","StartPeriod":0,"EndPeriod":0,"StartRange":0,"EndRange":0,"RangeType":0},"resultSets":[{"name":"PlayerStats","headers":["GAME_ID","TEAM_ID","TEAM_ABBREVIATION","TEAM_CITY","PLAYER_ID","PLAYER_NAME","START_POSITION","COMMENT","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"OKC","Oklahoma City",202331,"Paul George","F","","28:43",8,15,0.533,4,5,0.800,0,0,0.000,1,4,5,0,4,0,3,2,20,20.000],["0021700079",1610612760,"OKC","Oklahoma City",2546,"Carmelo Anthony","F","","27:41",7,16,0.438,5,9,0.556,2,2,1.000,0,5,5,2,0,0,0,2,21,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203500,"Steven Adams","C","","28:11",6,10,0.600,0,0,0.000,0,0,0.000,5,3,8,2,1,1,1,0,12,23.000],["0021700079",1610612760,"OKC","Oklahoma City",203460,"Andre Roberson","G","","20:22",2,2,1.000,1,1,1.000,0,0,0.000,1,0,1,1,0,1,0,2,5,21.000],["0021700079",1610612760,"OKC","Oklahoma City",201566,"Russell Westbrook","G","","28:25",5,13,0.385,0,2,0.000,2,3,0.667,2,11,13,13,2,0,2,2,12,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203924,"Jerami Grant","","","23:00",2,4,0.500,0,1,0.000,2,3,0.667,0,4,4,1,3,2,1,2,6,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203518,"Alex Abrines","","","18:09",2,8,0.250,1,5,0.200,0,0,0.000,0,2,2,1,2,0,1,2,5,-1.000],["0021700079",1610612760,"OKC","Oklahoma City",101109,"Raymond Felton","","","20:24",3,8,0.375,2,4,0.500,4,6,0.667,1,7,8,3,2,1,1,1,12,7.000],["0021700079",1610612760,"OKC","Oklahoma City",202335,"Patrick Patterson","","","15:01",1,5,0.200,0,4,0.000,0,0,0.000,1,1,2,0,2,0,1,0,2,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203962,"Josh Huestis","","","17:02",1,5,0.200,0,3,0.000,0,0,0.000,0,2,2,1,0,1,1,2,2,13.000],["0021700079",1610612760,"OKC","Oklahoma City",1628390,"Terrance Ferguson","","","4:48",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,0,0,0,0,2,2,7.000],["0021700079",1610612760,"OKC","Oklahoma City",2555,"Nick Collison","","","4:48",0,0,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,7.000],["0021700079",1610612760,"OKC","Oklahoma City",1626177,"Dakari Johnson","","","3:26",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,3,0,2,0,0,2,7.000],["0021700079",1610612741,"CHI","Chicago",1627835,"Paul Zipser","F","","17:05",1,5,0.200,1,3,0.333,0,0,0.000,1,2,3,0,1,0,3,1,3,-18.000],["0021700079",1610612741,"CHI","Chicago",1628374,"Lauri Markkanen","F","","27:54",3,7,0.429,3,6,0.500,6,6,1.000,3,5,8,1,0,1,0,1,15,-17.000],["0021700079",1610612741,"CHI","Chicago",201577,"Robin Lopez","C","","25:22",4,9,0.444,0,0,0.000,2,2,1.000,1,2,3,2,0,2,1,1,10,-13.000],["0021700079",1610612741,"CHI","Chicago",203200,"Justin Holiday","G","","31:39",4,16,0.250,3,10,0.300,0,1,0.000,2,3,5,2,1,0,4,1,11,-18.000],["0021700079",1610612741,"CHI","Chicago",1626170,"Jerian Grant","G","","21:00",0,7,0.000,0,6,0.000,2,3,0.667,1,1,2,4,1,0,0,1,2,-17.000],["0021700079",1610612741,"CHI","Chicago",1627756,"Denzel Valentine","","","30:55",3,12,0.250,2,5,0.400,0,0,0.000,3,6,9,3,1,0,3,2,8,-14.000],["0021700079",1610612741,"CHI","Chicago",202347,"Quincy Pondexter","","","20:06",1,7,0.143,1,7,0.143,3,4,0.750,0,3,3,0,0,0,3,3,6,-15.000],["0021700079",1610612741,"CHI","Chicago",1627739,"Kris Dunn","","","22:12",4,9,0.444,0,0,0.000,0,0,0.000,1,3,4,3,3,1,4,5,8,-8.000],["0021700079",1610612741,"CHI","Chicago",1626245,"Cristiano Felicio","","","22:38",1,2,0.500,0,0,0.000,0,0,0.000,0,8,8,0,0,1,1,0,2,-19.000],["0021700079",1610612741,"CHI","Chicago",1628021,"David Nwaba","","","16:21",1,3,0.333,0,0,0.000,2,4,0.500,0,2,2,0,2,0,0,1,4,-14.000],["0021700079",1610612741,"CHI","Chicago",1627770,"Kay Felder","","","4:48",0,1,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,-7.000]]},{"name":"TeamStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","240:00",39,88,0.443,13,34,0.382,10,14,0.714,11,41,52,28,16,8,11,17,101,32.000000],["0021700079",1610612741,"Bulls","CHI","Chicago","240:00",22,78,0.282,10,37,0.270,15,20,0.750,12,35,47,16,9,5,19,16,69,-32.000000]]},{"name":"TeamStarterBenchStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","STARTERS_BENCH","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Starters","133:22",28,56,0.500,10,17,0.588,4,5,0.800,9,23,32,18,7,2,6,8,70],["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Bench","106:38",11,32,0.344,3,17,0.176,6,9,0.667,2,18,20,10,9,6,5,9,31],["0021700079",1610612741,"Bulls","CHI","Chicago","Starters","123:00",12,44,0.273,7,25,0.280,10,12,0.833,8,13,21,9,3,3,8,5,41],["0021700079",1610612741,"Bulls","CHI","Chicago","Bench","117:00",10,34,0.294,3,12,0.250,5,8,0.625,4,22,26,7,6,2,11,11,28]]}]}'::JSONB AS data
          ) _data
      ) _result
  ) _row

You have a nice JSON dataset containing { resources, parameters, resultSets }. I care most about the resultSets because it contains the player information. This unmodified dataset I call _data.

data contains resultSets. resultSets is an array, and I want to spread this array out as individual rows.

JSONB_ARRAY_ELEMENTS(data -> 'resultSets')

Looking at the structure again, you don't even need to spread this object. The 0 index contains the PlayerStats information. So alternatively, you can replace it with:

data -> 'resultSets' -> 0 AS resultSets 

resultSets contains rowSets. rowSets has all the player information as an array. This data is spread into individual rows once more.

JSONB_ARRAY_ELEMENTS(resultSets -> 'rowSet')

Now, you have a JSONB array object at each row containing all the user information called rowSet. You can manually SELECT each column by their array index to display them.

rowSet -> 0 AS "GAME_ID"

You can format your JSON into a more readable format before trying to parse it.

Upvotes: 3

Related Questions