looping through data.frames

Question

I have 2 data.frames

> head(cont)
                    old_pert     cmap_name       conc   perturb_geo        t1        t2        t3        t4        t5
1 5202764005789148112904.A02     estradiol 0.00000001 GSM119257 GSM119218 GSM119219 GSM119221 GSM119222 GSM119223
2 5202764005789148112904.A01 valproic acid 0.00050000 GSM119256 GSM119218 GSM119219 GSM119221 GSM119222 GSM119223

> head(expression)[1:3,1:8]
          GSM118911 GSM118912 GSM118913 GSM118723 GSM118724 GSM118725 GSM118726 GSM118727
1007_s_at     387.6     393.2     290.5     378.6     507.8     383.7     288.8     451.9
1053_at        56.4      53.5      32.8      39.0      71.5      47.3      46.0      50.1
117_at          6.3      33.6      19.2      17.6      20.3      15.0       7.1      43.1

I want to apply a loop to do:

for(i in 1:nrow(cont)){

first take some values from cont which will be used ahead

vehicle <- cont[i, 5:9]
perturb <- cont[i, 4]
col_name <- paste(cont[i, 2], cont[i, 3], sep = '_') #estradiol_.00001
tmp <- sum(expression[,which(colnames(expression) == vehicle)])/5
tmp2 <- expression[,which(colnames(expression) == perturb)]
tmp3 <- tmp/tmp2
div <- cbind(div, tmp3)
colnames(div)[i + 1] <- col_name
}

Take those columns from expression where col.names == vehicle & perturb and apply division.

div <- expression$vehicle / expression$perturb #I'm not getting how I can pass here the value in `vehicle` and `perturb`

Assign this new variable a column name which should be a combination of drug_name and concentration

col.names(div) <- drug_name_concentration

assign it the row.names of expression:

row.names(div) <- row.names(expression)

So this process will iterate 271 times (nrow(cont) = 271) and every time a new divised column will be cbindto my previous div. Hence final outcome will be:

                arachidonic acid_0.000010     oligomycin_0.000001 .........
1007_s_at            0.45                      0.30
1053_at              1.34                      0.65
117_at               0.11                      0.67
.....
.....

The logic is clear in my head but I am not getting how I can do it. Thanks for your help.

amwill04 · Accepted Answer

You are not assigning the variables correctly in the loop. Below is a sample loop that will correctly go over each row assigning the variable. e.g. the first loop i == 1, note I have changed how the column name is generated.

for(i in 1:nrow(cont)){
       vehicle <- cont[i, 3]
       perturb <- cont[i, 4]
       col_name <- paste(cont[i, 5], cont[i, 6], sep = '_')
    }

To then search for the respective columns with these variable names you can then use:

df[,which(colnames(df) == x)]

approach where df is you data frame and x is the variable.

Therefore,

div <- data.frame(row.names(expression))
for(i in 1:nrow(cont)){
       vehicle <- cont[i, 3]
       perturb <- cont[i, 4]
       col_name <- paste(cont[i, 5], cont[i, 6], sep = '_')

       tmp <- expression[,which(colnames(expression) == vehicle)]/
                    expression[,which(colnames(expression) == perturb)]

       div <- cbind(div, tmp)

       colnames(div)[i + 1] <- col_name
    }

    div <- div[,-1]
    row.names(div) <- row.names(expression)

What is happening is it loops through each row, assigns the value to the variables before finding those columns and simply dividing by the resulting vectors.

It then binds by column to the div data frame created before the loop with the row names from table expression.

Finally, renames the column name and after completing the loop it then renames the row names and drops the first column with the now redundant values.

EDIT - question changed

change #1

vehicle <- cont[i, 5:9]

to

vehicle <- cont[i, c(5:9)] ## note c()

change #2

tmp <- sum(expression[,which(colnames(expression) == vehicle)])/5

to

tmp <- sum(expression[,which(colnames(expression) %in% vehicle)])/5

FINAL EDIT

Full working function:

for(i in 1:nrow(cont)){

  perturb <- cont[i, 4]
  col_name <- paste(cont[i, 2], cont[i, 3], sep = '_')
  vehicle <- cont[i, c(5:9)]
  vehicle <- unname(unlist(vehicle[1,]))
  tmp <- expression[,which(colnames(expression) %in% vehicle)]
  row_tots <- as.data.frame(rowSums(tmp))
  row_tots <- row_tots/5

  tmp <- row_tots/expression[,which(colnames(expression) == perturb)]
  div <- cbind(div, tmp)
  colnames(div)[i + 1] <- col_name
}
div <- div[,-1]
row.names(div) <- row.names(expression)

looping through data.frames

Answers (1)

Related Questions