atomsmasher
atomsmasher

Reputation: 745

R add to a list in a loop, using conditions

I have a data.frame dim = (200,500)

I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:

 colstoremove <- list();
 for (i in range(dim(I.df.nocov)[2])) {
     x <- shapiro.test(I.df.nocov[1:200,i])
     colstoremove[[i]] <- x[2]
    }

However this is failing. Some pointers? (background is mainly python, not much of an R user)

Upvotes: 0

Views: 58

Answers (2)

user31264
user31264

Reputation: 6737

Here is what happens in

for (i in range(dim(I.df.nocov)[2]))

For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.

dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)

dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5

range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).

Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!

The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.

Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.

On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).

Upvotes: 1

Parfait
Parfait

Reputation: 107652

Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:

colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])

Upvotes: 2

Related Questions