Factor analysis using R over sequential groups of columns in df

Question

I have a df with 10,000 columns (SNPs frequencies). I need to carry out a simulation (factor analysis) with non-repeating vectors. In order to do this, I need to carry out factor analysis on subsets of columns, divided in groups of 10. For example, cols 1:10, 11:20; 21:30. Since manually specifying this would take ages, I need a simple script that does it. I wrote this but it does not seem to work. I cannot figure out how to tell R when to start and stop each iteration.

ind=seq(1,(ncol(df)-10),by=10)

for (i in ind) { start=i;end=i+9; rez = factanal(df,factors=1, start:end)  }

coffeinjunky · Accepted Answer

Just a small pointer:

 groups <- seq(from=1, to=10000, by=10)

This may be useful for splitting up your columns into groups of 10. Then, for each element of group, you can add something like 0:9. See

> 1 + 0:9
 [1]  1  2  3  4  5  6  7  8  9 10

This can be used in subsetting your dataframe.

For instance,

for(i in groups){
  your_function( dat[, i + 0:9] )
}

will execute your function with the corresponding data. Make sure to store the output of the function appropriately. It may be useful to wrap it into a lapply call, as in

 lapply(groups, function(x) your_function(dat[, x + 0:9]))

to save the output in a list.

While this may be an answer to your question, let me nevertheless add what I would do since I think this may help you more in the long run: Instead of looping over columns, I would melt the dataframe into long format, create an index indicating groups of 10 as a new variable, and then use that variable as grouping variable in combination with dplyr's group_by() operations for grouped analysis.

Factor analysis using R over sequential groups of columns in df

Answers (1)

Related Questions