Some summary stat for each column in DF in R

Question

Let's say that a dataframe contains four columns

set.seed(123)
x1 <- runif(10,0,1)
x2 <- runif(10,0,1)
x3 <- runif(10,0,1)
x4 <- runif(10,0,1)
DF <- data.frame(x1,x2,x3,x4)

For each column, I want to calculate the number of observations that are less than or equal to 0.5. Here is my code, but it doesn't seem to be working:

a <- vector()
pvect1 <- vector()

for (j in 1:ncol(DF))
{
  for (i in 1:nrow(DF))
  {

    if (DF[i,j] <= 0.5)
      a[i]=1
    else
      a[i]=0 

    pvect1[j] <- cumsum(a[i])    

  }
}

Finally, I want to create a new dataframe (let's call it DF2) that contains two column (C1 and C2) where C1 is the column name in DF (x1, x2, x3, and x4), and C2 is the number of observations that are less than or equal to 0.5 for each column in DF.

akrun · Accepted Answer

We can do colSums on a logical matrix to find the number of TRUE elements in each column

v1 <- colSums(DF <= 0.5)

For creating a data.frame,

DF2 <- data.frame(C1 = names(v1), C2 = v1, stringsAsFactors=FALSE)

If we really need to use for loops

a <- vector() #it is better to pre-allocate the size
pvect1 <- vector() #same comment as above

for (j in 1:ncol(DF)) {
  for (i in 1:nrow(DF)) {

    if (DF[i,j] <= 0.5) {
        a[i]=1
        } else {
            a[i]=0
        }

     }
   pvect1[j] <-  sum(a)   
 }

Checking with the vectorized solution

identical(as.vector(v1), pvect1)
#[1] TRUE

Some summary stat for each column in DF in R

Answers (1)

Related Questions