Reputation: 61
I have a table: numTable that looks like this:
Now I want to find outiers for each of these columns. Please see my code below:
for (i in names(numTable)) {
#calculate mean and std for each column
meanValue <- mean(numTable[,i], na.rm=TRUE)
stdValue<-sd(numTable[,i],na.rm=TRUE)
#Sum up number of outliers for each column
print(paste("there are",sum(abs(numTable[,i]-meanValue)>3*stdValue,na.rm =
TRUE),"outliers in the column",i))
}
But I get error message:
Error in is.data.frame(x) : (list) object cannot be coerced to type 'double'
I fixed this problem by adding numTable<-as.data.frame(numTable)
at the beginning.
Could you please tell me why I have to add this line for my code to work? Does it have something to do with difference between tbl and data.frame?
Thanks.
Upvotes: 2
Views: 12732
Reputation: 1045
There is a hidden argument in square bracket indexing data.frames
called drop
, which defaults to TRUE
, and says if you index one column, try to simplify the result to a vector. See ?'['
.
However Hadley Wickham believes this this is unpredictable behaviour, so tbl
s enforce drop = FALSE
.
If you wanted to keep using tbl
s and avoid converting to a data.frame
. You could use dplyr::pull
to extract a single column as a vector. i.e:
is.vector(data.frame(a = 1:10, b = letters[1:10])[, 1])
#> [1] TRUE
is.vector(data.frame(a = 1:10, b = letters[1:10])[, 1, drop = FALSE])
#> [1] FALSE
is.vector(dplyr::tibble(a = 1:10, b = letters[1:10])[, 1])
#> [1] FALSE
is.vector(dplyr::pull(dplyr::tibble(a = 1:10, b = letters[1:10]), 1))
#> [1] TRUE
Upvotes: 4