Reputation: 87
I'm looking to put the data from this webpage: http://live.nhl.com/GameData/20162017/2016020725/PlayByPlay.json into a usable R data frame.
I've tried what I've seen so far by using:
library(jsonlite)
json <- "http://live.nhl.com/GameData/20162017/2016020725/PlayByPlay.json"
doc <- fromJSON(json, simplifyDataFrame = TRUE)
That puts the file into a list of 1
and to be honest, working with lists in R is not yet a skill of mine (more comfortable with data frames).
I'd like to be able to get scrape that webpage into a usable data frame.
I've tried
PBP <- rbindlist(lapply(doc, as.data.table), fill = TRUE)
but that did not work.
Any ideas? Happy to provide any more info if needed.
Upvotes: 1
Views: 101
Reputation: 70653
Perhaps the first course of action would be to understand lists down to the bone. What you have there is a list of length 1. If you do names(doc)
you will notice that this list element is named data
. To fully reveal the structure of the object, try str(doc)
. That's a lot of output! Here are a few first lines to give you the sense of what is going on.
Working with lists can be done using [[
and $
. Also [
but see this tweet for details. You can access the first element by doc$data
, doc[[1]]
or doc[["data]]
. All are equivalent, but some may be more handy for some tasks. To "climb" down the list tree, just append extra arguments. Note that you can mix all off these. See the inline code for a sneak preview. From your question it's not clear what part of the json file you're after. Try expanding the question or even better, tinker around with doc
.
doc:
data # doc[[1]] or doc[["data"]] pr doc$data
|___ refreshInterval # doc[[1]][[1]] or doc[[1]][["refreshinterval"]] or doc[["data"]][["refreshinterval]] or doc$data$refreshinterval
|___ game # doc[[1]][[2]] or doc[[1]][["game"]] or you go the idea
|___ awayteamid # doc$data$refreshinterval
|___ awayteamname
|___ hometeamname
|___ plays
|___ awayteamnick
|___ hometeamnick
|___ hometeamid
You can access game stats through
xy <- doc$data$game$plays$play
xy[1:6, c("desc", "type", "p2name", "teamid", "ycoord", "xcoord")]
desc type p2name teamid ycoord xcoord
1 Radko Gudas hit Chris Kreider Hit Chris Kreider 4 -12 -96
2 Pavel Buchnevich Wrist Shot saved by Steve Mason Shot Steve Mason 3 26 -42
3 Brandon Pirri hit Brandon Manning Hit Brandon Manning 3 42 -68
4 Nick Cousins hit Adam Clendening Hit Adam Clendening 4 35 92
5 Nick Cousins Wrist Shot saved by Henrik Lundqvist Shot Henrik Lundqvist 4 19 86
6 Michael Grabner Wrist Shot saved by Steve Mason Shot Steve Mason 3 5 -63
Upvotes: 3