Reputation: 57
To summarize the process, i pulled data from bbref, structured it in a data frame, converted 3 vectors to numeric from character, and took a subset of players who play greater than 20 games. Yet when i reorder the data and display the top 20, only a few non-qualifiers are removed and there are still many observations with less than 20 games played.
library(XML)
library(RCurl)
library(plyr)
urladv <- "https://www.basketball-reference.com/leagues/NBA_2019_advanced.html"
urladvdata <- getURL(urladv)
dataadv <- readHTMLTable(urladvdata, stringsAsFactors = FALSE, encoding = "UTF-8")
datadv <- structure(dataadv, row.names =c(NA, -734), .Names = seq_along(dataadv), class = "data.frame")
advstats <- ldply(dataadv, data.frame)
advstats[,c('PER', 'BPM')] <- sapply(advstats[,c('PER','BPM', 'G')], as.numeric)
advstats <- subset(advstats, G > 20)
advstats <- advstats[with(advstats,order(-PER)),]
advstats[1:20,]
The output of advstats[1:20,] includes players like Trevon Duval, Gary Payton, and Alan williams who each have 5 or under games played. I'm confused what the special case of these observations are since the subset removes over 100 observations.
Upvotes: 0
Views: 282
Reputation: 684
As mentioned by Ben in the comments, you're missing 'G'
from the line that does sapply()
. It should look like this:
advstats[,c('PER', 'BPM', 'G')] <- sapply(advstats[,c('PER','BPM', 'G')], as.numeric)
Because you were missing the 'G'
on the left-hand side of the <-
, it hadn't converted 'G'
from <chr>
to <dbl>
data type. Therefore, when you ran the subset()
function, it didn't work because you cannot use mathematical operators on character data types.
I trust that helps?
Upvotes: 1