nj2012
nj2012

Reputation: 105

data frame with 0 columns and 0 rows error

I am writing a method that finds outliers and print them to the user alongside a special symbol that indicates the outlier type. The outliers could be calculated in two ways: the Engineer's method or Tukey's method. The function takes two parameters: a data frame with one column of random numbers and option value that determines the method to be used in calculating the outliers. The function will return a data frame with two columns the value of the outlier and its type as a symbol (o or *)

When I call the function and I ask it to calculate the outliers using the engineer's method it works perfectly. However, using the Tukey's method it generates the following error (data frame with 0 columns and 0 rows)

The following is my code:

findOutliers<- function(numbers,option){
outlierM=c()
outlierE=c()
outlier=c()
typeM=c()
typeE=c()
type=c()
length=nrow(numbers)
print(numbers)


if(option=="eng"){
print("Engineer Methods")
numbersmean= as.numeric(sapply(numbers,mean)) 
numbersd= as.numeric(sapply(numbers,sd))


for(i in 1:length){
zscore= as.numeric((i-numbersmean)/numbersd)
if(zscore>2 & zscore<3){
#cat(zscore," ", "O","\n")
outlierM =c(outlierM,zscore)
typeM=c(typeM, "O")
}#end of if statment

else if(zscore>3){
#cat(zscore," ", "*", "\n")
outlierE =c(outlierE,zscore)
typeE=c(typeE, "*")


}#end of if statment

}#end of for loop
}#end of if statment


else if(option=="tukey"){
print("Tuckey's Methods")
sortedNumbers=numbers[order(numbers$Numbers), ]
IQR=IQR(sortedNumbers)
Q1=as.numeric(quantile(sortedNumbers,0.25))
Q3=as.numeric(quantile(sortedNumbers,0.75))
rangeM1=Q1 - (1.5 * IQR)
rangeM2=Q3 + (1.5 * IQR)
rangeE1=Q1 - (3 * IQR)
rangeE2=Q3 + (3 * IQR)

for(i in 1:length){
if(numbers[i,]<rangeM1|numbers[i,]>rangeM2){
outlierM=c(outlierM,numbers[i])
typeM=c(typeM, "O")
}#end of if statment

else if(numbers[i,]<rangeE1|numbers[i,]>rangeE2){
outlierE=c(outlierE, numbers[i])
typeE=c(typeE, "*")}

}# end of for loop 
}#end of if statment



outlier= c(outlierM,outlierE)
type=c(typeM,typeE)
founOtliers<- data.frame(Outliers=outlier,Type=type)
return(founOtliers)

}#end of function

normalnumbers=rnorm(10)
randomNumbers<- data.frame(Numbers=normalnumbers)
findOutliers(randomNumbers,"eng")
findOutliers(randomNumbers,"tukey")

Upvotes: 0

Views: 6638

Answers (1)

Justin
Justin

Reputation: 43255

2 things. First, I suggest indenting your code and using spaces where possible for clarity.

if (x = 1) {
  print ('foo')
} else {
  print ('bar')
}

Second, and more importantly, you are using the if/else syntax incorrectly (see my example above). From ?"if":

In particular, you should not have a newline between ‘}’ and
     ‘else’ to avoid a syntax error in entering a ‘if ... else’

However, that is not the problem, per say, in your code. If you add the lines

print (paste('first check', 
             numbers[i, ] < rangeM1 | numbers[i, ] > rangeM2))
print (paste('second check', 
             numbers[i, ] < rangeE1 | numbers[i, ] > rangeE2))

at the top of your second for loop, you'll see that you never satify either if condition, thus you return your empty data.frame...

In general, if you're using the if else if syntax, I think it is wise to always include a final else catchall that can provide some helpful advice or a default output.

Upvotes: 4

Related Questions