Looping through each row and each column at minimum computational time

Question

I have a dataset with 79003 rows and 97 columns. My dataset looks like as follows:

col1    col2          1  2         3          4         5        6        7        8         
str_11  str_44 0.2064191 0 0.6061358 0.92798677 2.7899374 1.098612 1.395511 0.000000 
str_11  str_22 0.9044563 0 1.7917595 0.00000000 1.1412787 1.504077 1.008228 0.000000 
str_11  str_18 0.8266786 0 0.5389965 0.81676114 0.2787134 0.000000 3.295837 0.000000 
str_11  str_1 0.8176492 0 5.0673306 4.45461768 0.8664189 6.549293 1.686399 2.079442

I am trying to iterate through each row and each column. I want to calculate the minimum and maximum value of column-wise and do the following calculation:

for (i in 1:nrow(log_trans2)){
    for (j in 3:ncol(log_trans2)){
        log_trans2[i, j] = log_trans2[i, ..j] -
          min(log_trans2[i, 3:ncol(log_trans2)]) / 
          (max(log_trans2[i, 3:ncol(log_trans2)]) - min(log_trans2[i, 3:ncol(log_trans2)]))
       }}

I added ..j after getting the error as

"Error in [.data.table(log_trans2, i, j) : j (the 2nd argument inside [...]) is a single symbol but column name 'j' is not found. Perhaps you intended DT[, ..j]. This difference to data.frame is deliberate and explained in FAQ 1.1

.

but it took more execution (like hours) . How do I reduce the timing with foreach or apply function?

The formula:

=(r-min(col))/(max(col)-min(col))

The expected outcome would be

    col1    col2    1   2   3   4   5   6   7   8
   Str_11   Str_44  0.029847796820572   0   0.080259104746805   0.11295123566895    0.405795371744574   0.138441206009843   0.167481921848205   0
   Str_11   Str_22  0.130782597207229   0   0.237248831160936   0   0.165998575836442   0.189535736027761   0.121002270272179   0
   Str_11   Str_18  0.119536094709514   0   0.071369116220582   0.099413248590107   0.040538762756246   0   0.39554907557078    0
   Str_11   Str_1   0.118230460268521   0   0.670970792433184   0.54220015449667    0.126020313332003   0.825306647460321   0.202392768285567   0.251126401405454

Ronak Shah · Accepted Answer

Here is one way to do this avoiding loops :

#Exclude columns which are not required for calculation
temp <- as.matrix(df[, -c(1:2)])
#Get column-wise minimum
min_vals <- matrixStats::colMins(temp)
#Get column-wise maximum
max_vals <- matrixStats::colMaxs(temp)
#Subtract minimum value of column from each element
s1 <- sweep(temp, 2, min_vals, `-`)
#Divide it by max - min
sweep(s1, 2, (max_vals - min_vals), `/`)

Looping through each row and each column at minimum computational time

Answers (2)

Related Questions