Reputation: 3575
I've got a data frame, df
, with the following columns:
> names(df)
[1] "survived" "sex" "age"
[4] "pclass" "sibsp" "predict.t_tree."
How do I do an element-wise subtraction of predict.t_tree
from survived
? It would be nice if I could just have the result as an array or something and not update the data frame itself.
Here's some example data:
> typeof(df$survived)
[1] "integer"
> head(df$survived,5)
[1] 1 1 0 0 0
> typeof(df$predict.t_tree)
[1] "integer"
> head(df$predict.t_tree,5)
[1] 1 0 1 0 1
Levels: 0 1
The following code just gives an error:
> df$survived - df$predict.t_tree
Warning message:
In Ops.factor(df$survived, df$predict.t_tree) : - not meaningful for factors
Upvotes: 1
Views: 16454
Reputation: 55340
Let's look at the output:
> typeof(df$survived)
[1] "integer"
> head(df$survived,5)
[1] 1 1 0 0 0
> typeof(df$predict.t_tree)
[1] "integer"
> head(df$predict.t_tree,5)
[1] 1 0 1 0 1
Levels: 0 1 <~~~~~ **** NOTICE HERE ****
When you see "Levels: ____ "
That tells you that the vector (or column) is a 'factor' and not a string or a number. If you are expecting anything other than a factor
, then you must convert it, generally with an as.character(.)
first. (Be very very cautious of using as.numeric(.)
directly on a factor, as it is likely not the results you will be seeking)
Once converted, the pairwise manipulation is a cinch:
df$predict.t_tree <- as.numeric(as.character(df$predict.t_tree))
# Then, this will give you what you are after
df$survived - df$predict.t_tree
Upvotes: 0
Reputation: 7714
Try the following with your data:
as.numeric(as.character(df$survived)) - df$predict.t_tree
EDIT Added a small example
df <- data.frame(x = c("1", "2", "3"),
y = 1:3)
str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ x: Factor w/ 3 levels "1","2","3": 1 2 3
# $ y: int 1 2 3
The x column is of type factor. You have to coerce the data type to numeric to be able to perform mathematical operations.
as.numeric(df$x) - df$y
This is answered also in the FAQ: 7:10
Upvotes: 2
Reputation: 2516
df$predict.t_tree
was created as a factor
df$predict.t_tree = as.numeric(as.character(df$predict.t_tree))
df$survived - df$predict.t_tree
Upvotes: 0