Reputation: 1372
I have a dataset:
> x
Treatment X1 X2
1 T1 6 7
2 T1 5 9
3 T1 8 6
4 T1 4 9
5 T1 7 9
6 T2 3 3
7 T2 1 6
8 T2 2 3
9 T3 2 3
10 T3 5 1
11 T3 3 1
12 T3 2 3
I am trying to find means of the columns X1 and X2. If I run the data as-is, I get an error:
> t1 <- subset(x[2:3], x$Treatment=="T1")
> x_vec <- colMeans(t1, na.rm = TRUE)
Error in colMeans(t1, na.rm = TRUE) : 'x' must be numeric
So, I need to convert X1 and X2 to numeric:
t1$X1 <- as.numeric(as.factor(t1$X1))
t1$X2 <- as.numeric(as.factor(t1$X2))
x_vec <- colMeans(t1, na.rm = TRUE)
But when I do that, I get the wrong result:
> x_vec
X1 X2
6.0 4.4
The t1, after conversion to as.numeric(), shows:
> t1
X1 X2
1 6 4
2 5 5
3 8 3
4 4 5
5 7 5
Why are the values in X2 changed after converting to numeric?
Upvotes: 0
Views: 3595
Reputation: 3776
This is a pretty common issue that newer R users hit. The issue is your use of as.factor
. running as.numeric
on a factor converts the value to the numeric index of the label, rather than converting the label itself to a number. Your can either remove the call to as.factor
or run as.character
on the factor before calling as.numeric
.
Note that some functions like as.data.frame
automatically convert characters to factors, which can cause problems. Check out the option stringsAsFactors
for more info.
Upvotes: 2