Indexing in nested loops

Question

I am new to R and this site. My aim with the following, assuredly unnecessarily-arcane code is to create an R function that produces a special type of box plot in ggplot2. I first need to process potential input thereinto by calculating the variables that I shall later wish to have plotted.

I start by generating some random data, called datos:

c1=rnorm(98,47,23)
c2=rnorm(98,56,13)
c3=rnorm(98,52,7)
fila1=as.matrix(t(c(-2,15,30)))
colnames(fila1)=c("c1","c2","c3")
fila2=as.matrix(t(c(-20,5,20)))
colnames(fila2)=c("c1","c2","c3")
datos=rbind(data.frame(c1,c2,c3),fila1,fila2)
rm(c1,c2,c3,fila1,fila2)

Then, I calculate the variables to later be plotted, which include for each of the present columns in datos the mean (puntoMedio), the first and third quartiles (cuar1,cuar3), the inner-quartile range (iqr), the lower bound of potential submean whiskers (limInf), the upper bound of potential supermean whiskers (limSup) and outliers (submean outliers vAtInf and supermean outliers vAtSup to be combined in vAt):

puntoMedio=apply(datos,MARGIN=2,FUN=mean)
cuar1=apply(datos,MARGIN=2,FUN=quantile,probs=.25)
cuar3=apply(datos,MARGIN=2,FUN=quantile,probs=.75)
cuar=rbind(cuar1,cuar3)
iqr=apply(cuar,MARGIN=2,FUN=diff)
cuar=rbind(cuar,iqr,puntoMedio)
limInf=array(dim=ncol(datos))
  for(i in 1:ncol(datos)){
    limInf0=as.matrix(t(cuar[1,]-1.5*cuar[3,]))
    if(length(datos[datos[,i]0){
      limInf[i]=limInf0[,i]
    }else{limInf[i]=min(datos[,i])}
  }
limSup=array(dim=ncol(datos))
  for(i in 1:ncol(datos)){
    limSup0=as.matrix(t(cuar[2,]+1.5*cuar[3,]))
    if(length(datos[datos[,i]>limSup0[,i],i])>0){
      limSup[i]=limSup0[,i]
    }else{limSup[i]=max(datos[,i])}
  }
d=data.frame(t(rbind(cuar,limInf,limSup)))
rm(cuar)
vAtInf=datos
  for(i in 1:ncol(vAtInf)){
    vAtInf[vAtInf[,i]>limInf0[,i],i]=NA
  }
  colnames(vAtInf)=c("vAtInfc1","vAtInfc2","vAtInfc3")
vAtSup=datos
  for(i in 1:ncol(vAtSup)){
    vAtSup[vAtSup[,i]



Everything works as desired up until here. I have two data frames d and datos, the former of no interest here, and the latter, which in this specific case comprises nine columns: three of all values, three of the corresponding submean outliers and three of the corresponding supermean outliers (these latter six padded with NA). I now wish to extract all outliers by column, wherefore I have tried formulating the following loop. While it does work giving neither error nor warning, it also does not give the desired output in vAt (again, the by-column [columns 4:9] outliers from datos). The problem, then, as far as I have been able to discern, occurs in the nested for-loop, upon attempting to input i into vAt: each iteration of the loop erases the last, such that upon completion of the entire loop, vAt only contains NA and the outliers from the last column/of the last iteration.

for(i in ((ncol(datos)/3)+1):ncol(datos)){
    vAt=matrix(nrow=.25*nrow(datos),ncol=ncol(datos)-(ncol(datos)/3))
    colnames(vAt)=c(((ncol(datos)/3)+1):ncol(datos))
    if(length(datos[,i][is.na(datos[,i])==F])>0){
        for(j in 1:(length(datos[,i][is.na(datos[,i])==F]))){
            nom=as.character(i)
            vAt[j,nom]=datos[,i][is.na(datos[,i])==F][j]
        }
    }else{next}
}


I have not been able to find any existent thread that answers my question. Thanks for any help.

musically_ut · Accepted Answer

The problem is that you are initialising vAt inside the loop here. Moving the initialisation statements outside the for loop will fix the problem that you are facing:

vAt=matrix(nrow=.25*nrow(datos),ncol=ncol(datos)-(ncol(datos)/3))
colnames(vAt)=c(((ncol(datos)/3)+1):ncol(datos))
for(i in ((ncol(datos)/3)+1):ncol(datos)){
    if(length(datos[,i][is.na(datos[,i])==F])>0){
        for(j in 1:(length(datos[,i][is.na(datos[,i])==F]))){
            nom=as.character(i)
            vAt[j,nom]=datos[,i][is.na(datos[,i])==F][j]
        }
    }else{next}
}

However, there are various improvements which you can make to the code as it stands:

Using vectorisation and *ply functions instead of for loops.
Not comparing logical vectors to ==F but instead only using !is.na(...).
Using sum(is.na(...)) instead of length(d[,i][!is.na(...)])

And some more. These will not change the correctness of the code, but will make it more efficient and more idiomatic.

Indexing in nested loops

Answers (1)

Related Questions