JoeDanger
JoeDanger

Reputation: 3575

Subtract one column from another in data frame

I've got a data frame, df, with the following columns:

> names(df)
[1] "survived"        "sex"             "age"            
[4] "pclass"          "sibsp"           "predict.t_tree."

How do I do an element-wise subtraction of predict.t_tree from survived? It would be nice if I could just have the result as an array or something and not update the data frame itself.

Here's some example data:

> typeof(df$survived)
[1] "integer"

> head(df$survived,5)
[1] 1 1 0 0 0

> typeof(df$predict.t_tree)
[1] "integer"

> head(df$predict.t_tree,5)
[1] 1 0 1 0 1
Levels: 0 1

The following code just gives an error:

> df$survived - df$predict.t_tree


Warning message:
In Ops.factor(df$survived, df$predict.t_tree) : - not meaningful for factors

Upvotes: 1

Views: 16454

Answers (3)

Ricardo Saporta
Ricardo Saporta

Reputation: 55340

Let's look at the output:

> typeof(df$survived)
[1] "integer"

> head(df$survived,5)
[1] 1 1 0 0 0

> typeof(df$predict.t_tree)
[1] "integer"

> head(df$predict.t_tree,5)
[1] 1 0 1 0 1
Levels: 0 1    <~~~~~ **** NOTICE HERE **** 

When you see "Levels: ____ " That tells you that the vector (or column) is a 'factor' and not a string or a number. If you are expecting anything other than a factor, then you must convert it, generally with an as.character(.) first. (Be very very cautious of using as.numeric(.) directly on a factor, as it is likely not the results you will be seeking)


Once converted, the pairwise manipulation is a cinch:

df$predict.t_tree <- as.numeric(as.character(df$predict.t_tree))

# Then, this will give you what you are after
df$survived - df$predict.t_tree

Upvotes: 0

marbel
marbel

Reputation: 7714

Try the following with your data:

as.numeric(as.character(df$survived)) - df$predict.t_tree

EDIT Added a small example

df <- data.frame(x = c("1", "2", "3"),
                 y = 1:3)

str(df)
# 'data.frame': 3 obs. of  2 variables:
#  $ x: Factor w/ 3 levels "1","2","3": 1 2 3
# $ y: int  1 2 3

The x column is of type factor. You have to coerce the data type to numeric to be able to perform mathematical operations.

as.numeric(df$x) - df$y

This is answered also in the FAQ: 7:10

Upvotes: 2

crogg01
crogg01

Reputation: 2516

df$predict.t_tree was created as a factor

df$predict.t_tree = as.numeric(as.character(df$predict.t_tree))
df$survived - df$predict.t_tree

Upvotes: 0

Related Questions