Reputation: 6209
Following is head
of my data:
dput(head(trucksv[,c(1,5)]))
structure(list(Measur. = c(1L, 2L, 3L, 4L, 5L, 1L), Speed.Mean.Trucks = c(NA,
NA, 9.5, 4.5, NA, NA)), .Names = c("Measur.", "Speed.Mean.Trucks"
), row.names = c(1L, 2L, 3L, 4L, 5L, 17L), class = "data.frame")
I want to find cumulative distribution of speeds by 'Measur.' for which I used following function:
f <- function(x) {
hi <- hist(x)
speedmph=round(hi$breaks*0.68,1)
prob=c(0, round(cumsum(hi$counts)/sum(hi$counts),digits=2))
cbind(speedmph, prob)
}
But when I try to apply it to my data R gives me following error:
tspdistu <- ddply(trucksv, 'Measur.', summarise, trucksspeedmph = f(Speed.Mean.Trucks))
Error in hist.default(x) : invalid number of 'breaks'
Called from: top level
Browse[1]>
I am not sure how to find correct number of bins. Please help. Thanks in advance.
Upvotes: 0
Views: 106
Reputation: 78852
The NA
's are throwing it off (i.e. it has nothing to do with the # of bins). Here's a slightly modified f()
with both plotting disabled for hist
(it's unlikely you want plots) and with handing a column subset that's all NA
's
f <- function(x) {
y <- x[!is.na(x)]
if (length(y) > 0) {
hi <- hist(x, plot=FALSE)
speedmph <- round(hi$breaks*0.68,1)
prob <- c(0, round(cumsum(hi$counts) / sum(hi$counts), digits=2))
cbind(speedmph, prob)
} else { # still need to return proper sized values
cbind(rep(NA, length(x)), rep(NA, length(x)))
}
}
Upvotes: 1