jj2593
jj2593

Reputation: 77

mean() returning error "argument is not numeric or logical: returning NA" but only for some columns in data frame?

I'm pretty new to r so maybe this is something obvious but I'm not sure what's going on. I load in a file that has a bunch of data that I then split up into separate data frames. They look something like:

      V3    V4    V5    V6    V7    V8    V9   V10   V11    V12    V13    V14
3  1.000     2     3     4     5     6 7.000 8.000 9.000 10.000 11.000 12.000
4  0.042 0.067 0.292 0.206 0.071 0.067 0.040 0.063 0.059  0.040  0.066  0.040
5  0.043 0.172 0.179 0.199 0.073 0.067 0.040 0.062 0.058  0.039  0.066  0.039
6  0.040 0.066  0.29 0.185 0.072 0.067 0.040 0.062 0.058  0.039  0.065  0.039
7  0.039 0.068 0.291 0.189 0.075 0.069 0.040 0.064 0.058  0.041  0.064  0.039
8  0.042 0.063 0.271 0.191  0.07 0.068 0.040 0.065 0.058  0.041  0.066  0.040
9  0.041 0.067 0.342 0.199 0.069 0.066 0.041 0.065 0.057  0.040  0.065  0.042
10 0.044 0.064 0.295 0.198 0.069 0.067 0.039 0.064 0.057  0.040  0.067  0.041
11 0.041 0.067  0.29 0.211 0.066 0.067 0.043 0.056 0.058  0.042  0.067  0.042

I'm trying to find the means of rows 4-6 and 7-9 for each column. I have each data frame in a list called "plates". When I use the line:

plates[[1]][2:4, 7]

I end up with the output:

[1] 0.04 0.04 0.04

If I include mean() in the code above it works fine for columns 7 and higher. However when I used that same code for columns lower than 7, say column 2, I end up with:

[1] 0.067 0.172 0.066
57 Levels:  0.063 0.064 0.066 0.067 0.068 0.069 0.07 0.071 0.072 0.08 0.081 0.082 0.083 0.084 0.085 ... PlateFormat

I have no idea what this 57 Levels: thing is but I'm assuming this is my problem. I only want the mean of the 3 numbers (0.067, 0.172, 0.066) but this 57 Levels being returned appears to be causing mean() to give me the error in the title. Any help with this would be greatly appreciated.

Upvotes: 1

Views: 83

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226961

There is an entry somewhere in that column that can't be processed into a number, so read.csv() (or whatever you used) is reading the data in as a factor. It could be a typo (something as simple as an extra decimal point or a trailing comma), a missing-value code such as "?"

You can use

numify <- function(x) as.numeric(as.character(x))
mydata[] <- lapply(mydata, numify)

to convert by brute force, but it would be better to use

bad_vals <- function(x) {
    x[!is.na(x) & is.na(numify(x))   
}
lapply(mydata, bad_vals)

to identify what the bad values are, so you can fix them upstream in your data file (or add missing-value codes to the na.strings= argument in your input code)

Upvotes: 2

Related Questions