Reputation: 1
I have a dataframe with 8,000 genes expression values in rows and 40 columns. The 40 columns represent observations from 4 different time points. The first ten columns are from no treatment, the second ten are after one week of treatment, the third column is after 2 weeks of treatment, and the fourth column is after 7 weeks of treatment.
This is the idea (smaller ex edit):
m1 <- matrix(data=1:32, nrow = 8, ncol = 4)
time <- c(1:4)
summaries <- data.frame(nrow=8, ncol=4)
pvalues <- function(x) {
for (i in (1:8)) {
raw <- unlist(as.vector(x[i,]))
lm <- lm(raw ~ time)
summaries[i, ] <- summary(lm)$coefficients[,4]
return(summaries)
}
}
pvalues(m1)
I know it is still overwriting but I don't know how to fit ix.
Upvotes: 0
Views: 166
Reputation: 145765
Okay I ran your code and got
Warning message:
In summary.lm(lm) : essentially perfect fit: summary may be unreliable
This is because the sample data was sequential. I changed m1
to use rnorm(32)
instead of 1:32
, and that problem is solved. (Though this problem is specific to the example, not the real case, I would assume.)
Next problem: the size of the summaries
object. You used data.frame
which takes column names as arguments. We want to use matrix
which lets you set the rows and columns:
## Bad - 1 row and 2 columns
summaries <- data.frame(nrow=8, ncol=4)
summaries
# nrow ncol
# 1 8 4
## Good - 8 rows, 2 columns
summaries <- matrix(nrow = 8, ncol = 2)
# summaries
# [,1] [,2]
# [1,] NA NA
# [2,] NA NA
# [3,] NA NA
# [4,] NA NA
# [5,] NA NA
# [6,] NA NA
# [7,] NA NA
# [8,] NA NA
But then, running your code still only the first row is filled! Why? A function stops as soon as it hits a return()
, thinking it is done. We need to move return(summaries)
to after the for
loop, not inside the for
loop. Putting it all together:
m1 <- matrix(data = rnorm(32),
nrow = 8,
ncol = 4)
time <- c(1:4)
summaries <- matrix(nrow = 8, ncol = 2)
pvalues <- function(x) {
for (i in (1:8)) {
raw <- c(x[i, ])
lm <- lm(raw ~ time)
summaries[i,] <- summary(lm)$coefficients[, 4]
} ## End the for loop before the `return()`
return(summaries)
}
pvalues(m1)
# [,1] [,2]
# [1,] 0.6235167 0.5461115
# [2,] 0.4256698 0.3992509
# [3,] 0.3041439 0.2751724
# [4,] 0.8087557 0.8252432
# [5,] 0.8820501 0.1812292
# [6,] 0.4997327 0.5582880
# [7,] 0.5589398 0.8150613
# [8,] 0.6283059 0.8994896
We can gussy it up a little bit by assigning column names based on the last iteration summary:
pvalues <- function(x) {
for (i in (1:8)) {
raw <- c(x[i, ])
lm <- lm(raw ~ time)
summaries[i,] <- summary(lm)$coefficients[, 4]
}
colnames(summaries) = names(summary(lm)$coefficients[, 4])
return(summaries)
}
pvalues(m1)
# (Intercept) time
# [1,] 0.6235167 0.5461115
# [2,] 0.4256698 0.3992509
# [3,] 0.3041439 0.2751724
# [4,] 0.8087557 0.8252432
# [5,] 0.8820501 0.1812292
# [6,] 0.4997327 0.5582880
# [7,] 0.5589398 0.8150613
# [8,] 0.6283059 0.8994896
Upvotes: 1