Reputation: 349
I have the following JSON stored in a JSONB column called 'data' within a Postgres table:
{"resource":"boxscore","parameters":{"GameID":"0021700079","StartPeriod":0,"EndPeriod":0,"StartRange":0,"EndRange":0,"RangeType":0},"resultSets":[{"name":"PlayerStats","headers":["GAME_ID","TEAM_ID","TEAM_ABBREVIATION","TEAM_CITY","PLAYER_ID","PLAYER_NAME","START_POSITION","COMMENT","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"OKC","Oklahoma City",202331,"Paul George","F","","28:43",8,15,0.533,4,5,0.800,0,0,0.000,1,4,5,0,4,0,3,2,20,20.000],["0021700079",1610612760,"OKC","Oklahoma City",2546,"Carmelo Anthony","F","","27:41",7,16,0.438,5,9,0.556,2,2,1.000,0,5,5,2,0,0,0,2,21,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203500,"Steven Adams","C","","28:11",6,10,0.600,0,0,0.000,0,0,0.000,5,3,8,2,1,1,1,0,12,23.000],["0021700079",1610612760,"OKC","Oklahoma City",203460,"Andre Roberson","G","","20:22",2,2,1.000,1,1,1.000,0,0,0.000,1,0,1,1,0,1,0,2,5,21.000],["0021700079",1610612760,"OKC","Oklahoma City",201566,"Russell Westbrook","G","","28:25",5,13,0.385,0,2,0.000,2,3,0.667,2,11,13,13,2,0,2,2,12,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203924,"Jerami Grant","","","23:00",2,4,0.500,0,1,0.000,2,3,0.667,0,4,4,1,3,2,1,2,6,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203518,"Alex Abrines","","","18:09",2,8,0.250,1,5,0.200,0,0,0.000,0,2,2,1,2,0,1,2,5,-1.000],["0021700079",1610612760,"OKC","Oklahoma City",101109,"Raymond Felton","","","20:24",3,8,0.375,2,4,0.500,4,6,0.667,1,7,8,3,2,1,1,1,12,7.000],["0021700079",1610612760,"OKC","Oklahoma City",202335,"Patrick Patterson","","","15:01",1,5,0.200,0,4,0.000,0,0,0.000,1,1,2,0,2,0,1,0,2,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203962,"Josh Huestis","","","17:02",1,5,0.200,0,3,0.000,0,0,0.000,0,2,2,1,0,1,1,2,2,13.000],["0021700079",1610612760,"OKC","Oklahoma City",1628390,"Terrance Ferguson","","","4:48",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,0,0,0,0,2,2,7.000],["0021700079",1610612760,"OKC","Oklahoma City",2555,"Nick Collison","","","4:48",0,0,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,7.000],["0021700079",1610612760,"OKC","Oklahoma City",1626177,"Dakari Johnson","","","3:26",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,3,0,2,0,0,2,7.000],["0021700079",1610612741,"CHI","Chicago",1627835,"Paul Zipser","F","","17:05",1,5,0.200,1,3,0.333,0,0,0.000,1,2,3,0,1,0,3,1,3,-18.000],["0021700079",1610612741,"CHI","Chicago",1628374,"Lauri Markkanen","F","","27:54",3,7,0.429,3,6,0.500,6,6,1.000,3,5,8,1,0,1,0,1,15,-17.000],["0021700079",1610612741,"CHI","Chicago",201577,"Robin Lopez","C","","25:22",4,9,0.444,0,0,0.000,2,2,1.000,1,2,3,2,0,2,1,1,10,-13.000],["0021700079",1610612741,"CHI","Chicago",203200,"Justin Holiday","G","","31:39",4,16,0.250,3,10,0.300,0,1,0.000,2,3,5,2,1,0,4,1,11,-18.000],["0021700079",1610612741,"CHI","Chicago",1626170,"Jerian Grant","G","","21:00",0,7,0.000,0,6,0.000,2,3,0.667,1,1,2,4,1,0,0,1,2,-17.000],["0021700079",1610612741,"CHI","Chicago",1627756,"Denzel Valentine","","","30:55",3,12,0.250,2,5,0.400,0,0,0.000,3,6,9,3,1,0,3,2,8,-14.000],["0021700079",1610612741,"CHI","Chicago",202347,"Quincy Pondexter","","","20:06",1,7,0.143,1,7,0.143,3,4,0.750,0,3,3,0,0,0,3,3,6,-15.000],["0021700079",1610612741,"CHI","Chicago",1627739,"Kris Dunn","","","22:12",4,9,0.444,0,0,0.000,0,0,0.000,1,3,4,3,3,1,4,5,8,-8.000],["0021700079",1610612741,"CHI","Chicago",1626245,"Cristiano Felicio","","","22:38",1,2,0.500,0,0,0.000,0,0,0.000,0,8,8,0,0,1,1,0,2,-19.000],["0021700079",1610612741,"CHI","Chicago",1628021,"David Nwaba","","","16:21",1,3,0.333,0,0,0.000,2,4,0.500,0,2,2,0,2,0,0,1,4,-14.000],["0021700079",1610612741,"CHI","Chicago",1627770,"Kay Felder","","","4:48",0,1,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,-7.000]]},{"name":"TeamStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","240:00",39,88,0.443,13,34,0.382,10,14,0.714,11,41,52,28,16,8,11,17,101,32.000000],["0021700079",1610612741,"Bulls","CHI","Chicago","240:00",22,78,0.282,10,37,0.270,15,20,0.750,12,35,47,16,9,5,19,16,69,-32.000000]]},{"name":"TeamStarterBenchStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","STARTERS_BENCH","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Starters","133:22",28,56,0.500,10,17,0.588,4,5,0.800,9,23,32,18,7,2,6,8,70],["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Bench","106:38",11,32,0.344,3,17,0.176,6,9,0.667,2,18,20,10,9,6,5,9,31],["0021700079",1610612741,"Bulls","CHI","Chicago","Starters","123:00",12,44,0.273,7,25,0.280,10,12,0.833,8,13,21,9,3,3,8,5,41],["0021700079",1610612741,"Bulls","CHI","Chicago","Bench","117:00",10,34,0.294,3,12,0.250,5,8,0.625,4,22,26,7,6,2,11,11,28]]}]}
I need to parse it to pull out the box score statistics for both the individual players and teams, and subsequently put them in another table where it's one row per player, and all the different statistics are placed in columns. For example, the final table should look like:
Player | PTS | REB | AST | etc....
Paul George | 20 | 5 | 0 | etc....
Carmelo Anthony | 21 | 5 | 2 | etc....
However, it looks as if the values are stored in a list. How do I utilize the JSONB functions to parse these data points out? I've tried looking at the official Postgres documentation, but can't find a way to specifically pull this data out in the fashion I need it.
I inserted the data into the Postgres database via Python. So if it's easier/wiser to parse the data within Python first and transfer it into Postgres, please let me know.
Upvotes: 3
Views: 3612
Reputation: 2705
Parsing all of this out manually will be tedious, but it's the most obvious solution:
SELECT
rowSet -> 0 AS "GAME_ID",
rowSet -> 1 AS "TEAM_ID",
rowSet -> 2 AS "TEAM_ABBREVIATION",
rowSet -> 3 AS "TEAM_CITY",
rowSet -> 4 AS "PLAYER_ID",
rowSet -> 5 AS "PLAYER_NAME",
rowSet -> 6 AS "START_POSITION",
rowSet -> 7 AS "COMMENT",
rowSet -> 8 AS "MIN",
rowSet -> 9 AS "FGM",
rowSet -> 10 AS "FGA",
rowSet -> 11 AS "FG_PCT",
rowSet -> 12 AS "FG3M",
rowSet -> 13 AS "FG3A",
rowSet -> 14 AS "FG3_PCT",
rowSet -> 15 AS "FTM",
rowSet -> 16 AS "FTA",
rowSet -> 17 AS "FT_PCT",
rowSet -> 18 AS "OREB",
rowSet -> 19 AS "DREB",
rowSet -> 20 AS "REB",
rowSet -> 21 AS "AST",
rowSet -> 22 AS "STL",
rowSet -> 23 AS "BLK",
rowSet -> 24 AS "TO",
rowSet -> 25 AS "PF",
rowSet -> 26 AS "PTS",
rowSet -> 27 AS "PLUS_MINUS"
FROM
(
SELECT
JSONB_ARRAY_ELEMENTS(resultSets -> 'rowSet') AS rowSet
FROM
(
SELECT
JSONB_ARRAY_ELEMENTS(data -> 'resultSets') AS resultSets -- Get individual players
FROM
(
SELECT
'{"resource":"boxscore","parameters":{"GameID":"0021700079","StartPeriod":0,"EndPeriod":0,"StartRange":0,"EndRange":0,"RangeType":0},"resultSets":[{"name":"PlayerStats","headers":["GAME_ID","TEAM_ID","TEAM_ABBREVIATION","TEAM_CITY","PLAYER_ID","PLAYER_NAME","START_POSITION","COMMENT","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"OKC","Oklahoma City",202331,"Paul George","F","","28:43",8,15,0.533,4,5,0.800,0,0,0.000,1,4,5,0,4,0,3,2,20,20.000],["0021700079",1610612760,"OKC","Oklahoma City",2546,"Carmelo Anthony","F","","27:41",7,16,0.438,5,9,0.556,2,2,1.000,0,5,5,2,0,0,0,2,21,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203500,"Steven Adams","C","","28:11",6,10,0.600,0,0,0.000,0,0,0.000,5,3,8,2,1,1,1,0,12,23.000],["0021700079",1610612760,"OKC","Oklahoma City",203460,"Andre Roberson","G","","20:22",2,2,1.000,1,1,1.000,0,0,0.000,1,0,1,1,0,1,0,2,5,21.000],["0021700079",1610612760,"OKC","Oklahoma City",201566,"Russell Westbrook","G","","28:25",5,13,0.385,0,2,0.000,2,3,0.667,2,11,13,13,2,0,2,2,12,26.000],["0021700079",1610612760,"OKC","Oklahoma City",203924,"Jerami Grant","","","23:00",2,4,0.500,0,1,0.000,2,3,0.667,0,4,4,1,3,2,1,2,6,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203518,"Alex Abrines","","","18:09",2,8,0.250,1,5,0.200,0,0,0.000,0,2,2,1,2,0,1,2,5,-1.000],["0021700079",1610612760,"OKC","Oklahoma City",101109,"Raymond Felton","","","20:24",3,8,0.375,2,4,0.500,4,6,0.667,1,7,8,3,2,1,1,1,12,7.000],["0021700079",1610612760,"OKC","Oklahoma City",202335,"Patrick Patterson","","","15:01",1,5,0.200,0,4,0.000,0,0,0.000,1,1,2,0,2,0,1,0,2,2.000],["0021700079",1610612760,"OKC","Oklahoma City",203962,"Josh Huestis","","","17:02",1,5,0.200,0,3,0.000,0,0,0.000,0,2,2,1,0,1,1,2,2,13.000],["0021700079",1610612760,"OKC","Oklahoma City",1628390,"Terrance Ferguson","","","4:48",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,0,0,0,0,2,2,7.000],["0021700079",1610612760,"OKC","Oklahoma City",2555,"Nick Collison","","","4:48",0,0,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,7.000],["0021700079",1610612760,"OKC","Oklahoma City",1626177,"Dakari Johnson","","","3:26",1,1,1.000,0,0,0.000,0,0,0.000,0,1,1,3,0,2,0,0,2,7.000],["0021700079",1610612741,"CHI","Chicago",1627835,"Paul Zipser","F","","17:05",1,5,0.200,1,3,0.333,0,0,0.000,1,2,3,0,1,0,3,1,3,-18.000],["0021700079",1610612741,"CHI","Chicago",1628374,"Lauri Markkanen","F","","27:54",3,7,0.429,3,6,0.500,6,6,1.000,3,5,8,1,0,1,0,1,15,-17.000],["0021700079",1610612741,"CHI","Chicago",201577,"Robin Lopez","C","","25:22",4,9,0.444,0,0,0.000,2,2,1.000,1,2,3,2,0,2,1,1,10,-13.000],["0021700079",1610612741,"CHI","Chicago",203200,"Justin Holiday","G","","31:39",4,16,0.250,3,10,0.300,0,1,0.000,2,3,5,2,1,0,4,1,11,-18.000],["0021700079",1610612741,"CHI","Chicago",1626170,"Jerian Grant","G","","21:00",0,7,0.000,0,6,0.000,2,3,0.667,1,1,2,4,1,0,0,1,2,-17.000],["0021700079",1610612741,"CHI","Chicago",1627756,"Denzel Valentine","","","30:55",3,12,0.250,2,5,0.400,0,0,0.000,3,6,9,3,1,0,3,2,8,-14.000],["0021700079",1610612741,"CHI","Chicago",202347,"Quincy Pondexter","","","20:06",1,7,0.143,1,7,0.143,3,4,0.750,0,3,3,0,0,0,3,3,6,-15.000],["0021700079",1610612741,"CHI","Chicago",1627739,"Kris Dunn","","","22:12",4,9,0.444,0,0,0.000,0,0,0.000,1,3,4,3,3,1,4,5,8,-8.000],["0021700079",1610612741,"CHI","Chicago",1626245,"Cristiano Felicio","","","22:38",1,2,0.500,0,0,0.000,0,0,0.000,0,8,8,0,0,1,1,0,2,-19.000],["0021700079",1610612741,"CHI","Chicago",1628021,"David Nwaba","","","16:21",1,3,0.333,0,0,0.000,2,4,0.500,0,2,2,0,2,0,0,1,4,-14.000],["0021700079",1610612741,"CHI","Chicago",1627770,"Kay Felder","","","4:48",0,1,0.000,0,0,0.000,0,0,0.000,0,0,0,1,0,0,0,0,0,-7.000]]},{"name":"TeamStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS","PLUS_MINUS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","240:00",39,88,0.443,13,34,0.382,10,14,0.714,11,41,52,28,16,8,11,17,101,32.000000],["0021700079",1610612741,"Bulls","CHI","Chicago","240:00",22,78,0.282,10,37,0.270,15,20,0.750,12,35,47,16,9,5,19,16,69,-32.000000]]},{"name":"TeamStarterBenchStats","headers":["GAME_ID","TEAM_ID","TEAM_NAME","TEAM_ABBREVIATION","TEAM_CITY","STARTERS_BENCH","MIN","FGM","FGA","FG_PCT","FG3M","FG3A","FG3_PCT","FTM","FTA","FT_PCT","OREB","DREB","REB","AST","STL","BLK","TO","PF","PTS"],"rowSet":[["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Starters","133:22",28,56,0.500,10,17,0.588,4,5,0.800,9,23,32,18,7,2,6,8,70],["0021700079",1610612760,"Thunder","OKC","Oklahoma City","Bench","106:38",11,32,0.344,3,17,0.176,6,9,0.667,2,18,20,10,9,6,5,9,31],["0021700079",1610612741,"Bulls","CHI","Chicago","Starters","123:00",12,44,0.273,7,25,0.280,10,12,0.833,8,13,21,9,3,3,8,5,41],["0021700079",1610612741,"Bulls","CHI","Chicago","Bench","117:00",10,34,0.294,3,12,0.250,5,8,0.625,4,22,26,7,6,2,11,11,28]]}]}'::JSONB AS data
) _data
) _result
) _row
You have a nice JSON dataset containing { resources, parameters, resultSets }. I care most about the resultSets because it contains the player information. This unmodified dataset I call _data.
data contains resultSets. resultSets is an array, and I want to spread this array out as individual rows.
JSONB_ARRAY_ELEMENTS(data -> 'resultSets')
Looking at the structure again, you don't even need to spread this object. The 0 index contains the PlayerStats information. So alternatively, you can replace it with:
data -> 'resultSets' -> 0 AS resultSets
resultSets contains rowSets. rowSets has all the player information as an array. This data is spread into individual rows once more.
JSONB_ARRAY_ELEMENTS(resultSets -> 'rowSet')
Now, you have a JSONB array object at each row containing all the user information called rowSet. You can manually SELECT each column by their array index to display them.
rowSet -> 0 AS "GAME_ID"
You can format your JSON into a more readable format before trying to parse it.
Upvotes: 3