user5363938
user5363938

Reputation: 841

Error in colMeans(adult_csv[1], na.rm = TRUE) : 'x' must be numeric

i am trying to obtain the means of the age field of this dataset. I have cleaned it. but when i run

colMeans(adult_csv[1], na.rm = TRUE)

it complains with:

Error in colMeans(adult_csv[1], na.rm = TRUE) : 'x' must be numeric

i have tried the adult_csv[1] and it correctly gives me the age feature. also there is no x or missing data in it.

Upvotes: 0

Views: 3034

Answers (1)

detroyejr
detroyejr

Reputation: 1154

If you're using read.csv, there are a some characters in the age column that causes R to read the whole column as a character vector and not numeric. For the hist function to work, all the data needs to be class numeric.

First look at:

adult_csv[which(is.na(as.numeric(adult_csv[1]))), 1]

There are a bunch of "?" values which R can't use when calculating histogram. These should be NA values anyways since "?" is just a placeholder for missing data. When you convert this column to numeric, R will give you a warning that it can't convert "?" into a number. Instead it will use NA, but that's probably a good result in your case.

Use:

adult_csv[1] <- as.numeric(adult_csv[1])

When you're importing data, just take a moment to look for this kind of stuff and learn what the error messages mean. There are also lots of other questions on stack overflow that answer this same question.

Hopefully that makes sense.

Upvotes: 1

Related Questions