Reputation: 613

apply() works not as expected

I'm trying to get a hold on how the apply function works. Here is what I tried:

df = data.frame(x=c(1,2,3,4,5), x2=c(1,2,3,4,5))
apply(df$x2, 2, function(x) x*2) #doesn't work
apply(df["x2"], 2, function(x) x*2) #works
apply(df[,2], 2, function(x) (x*2)) #doesn't work
apply(df[2], 2, function(x) x*2) #works (suprisingly)
apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above

Questions (as idicated by comments):

Why doesn't line 2 work although line 3 does?
Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead?
In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with MARGIN=2?

Upvotes: 3

Answers (4)

Miff

Reputation: 7951

Why doesn't line 2 work although line 3 does?

df$x2 is a vector i.e. c(1,2,3,4,5) whereas df["x2"] is a data frame with just one column. The vector has no second dimension to apply over. See ?'['] in R for details of how subsetting works, this isn't really related to the apply function

Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead? Again, see the subsetting help page, but df[,2,drop=FALSE] is probably what you need.

In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with MARGIN=2? The value section of ?apply explains the dimensions that you can expect as output from a call to apply:

If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.

In this case we see that:

> dim(df[2,]) # [1] 1 2

and so:

apply(df[2,], 1, function(x) x*2)

has n=2 and dim(df[2,])[1]=1, so you should expect an output with dimensions c(2,1).

Upvotes: 1

Andre Elrico

Reputation: 11500

apply needs to be used on something with a dimension of positive length. For simplicity some Object that has rows and columns.

That's why you have margin 1, 2. Standing for the row-wise and col-wise operation.

Check your Input values like this:

dim(df["x2"])
dim(df[,2]) #this is null, so it does not work

df[,2] gives you a vector same as df$x2. A vector does not have rows and cols. Therefore not working with apply.

In order to understand what you are doing wrong:

Type ?"[" into your console and read everything. Also play around... what you are already doing!

Have a closer look at the drop argument.

Lastly with df[2,] your subsetting a single row. It's still a dataframe. Check dim(df[2,])

apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above

The reason you don't get the same output. Is the WHOLE reason why apply exists. Please read ?apply to understand.

When you have questions after reading the two mentioned resources, feel free to ask more.

Here is a little example:

m <- matrix(1:9,nrow=3)
m
apply(m,1,max) #row-wise max value
apply(m,2,max) #col-wise max value

Upvotes: 3

clemens

Reputation: 6813

The problem is subsetting:

First: df$x2 and df[, 2] are different from df["x2"] and df[2], as the former return a numeric vector, the latter return a data.frame.

Second: df[2, ] returns the second row of your data.frame. If you use MARGIN = 1 you go through the rows, each row is represented as a (named) vector of length equal to the number of columns in your data.frame. If you use MARGIN = 2 you go through the columns, again, each column is represented as a (named) vector of length equal to the number of rows in your data.frame.

Upvotes: 1

RobJan

Reputation: 1441

You should look at each type and dimension of the expression

> typeof(df$x2)
[1] "double"
> dim(df$x2)
NULL

> typeof(df["x2"])
[1] "list"
> dim(df["x2"])
[1] 5 1

> typeof(df[, 2])
[1] "double"
> dim(df[, 2])
NULL

> typeof(df[2])
[1] "list
> dim(df[2])
[1] 5 1

> typeof(df[2, ])
[1] "list"
> dim(df[2,])
[1] 1 2

The line 2 does not work because you try to apply function to variable which has NULL dimension. (dim(X) must have positive length). The rest is similar. You must keep attention on the type of the expression in apply. I recommend you to simply print values to check if there are properly for the apply function.

Upvotes: 0

apply() works not as expected

Answers (4)

Related Questions