Reputation: 1145
Let's say that a dataframe contains four columns
set.seed(123)
x1 <- runif(10,0,1)
x2 <- runif(10,0,1)
x3 <- runif(10,0,1)
x4 <- runif(10,0,1)
DF <- data.frame(x1,x2,x3,x4)
For each column, I want to calculate the number of observations that are less than or equal to 0.5. Here is my code, but it doesn't seem to be working:
a <- vector()
pvect1 <- vector()
for (j in 1:ncol(DF))
{
for (i in 1:nrow(DF))
{
if (DF[i,j] <= 0.5)
a[i]=1
else
a[i]=0
pvect1[j] <- cumsum(a[i])
}
}
Finally, I want to create a new dataframe (let's call it DF2) that contains two column (C1 and C2) where C1 is the column name in DF (x1, x2, x3, and x4), and C2 is the number of observations that are less than or equal to 0.5 for each column in DF.
Upvotes: 0
Views: 133
Reputation: 886948
We can do colSums
on a logical matrix
to find the number of TRUE elements in each column
v1 <- colSums(DF <= 0.5)
For creating a data.frame
,
DF2 <- data.frame(C1 = names(v1), C2 = v1, stringsAsFactors=FALSE)
If we really need to use for
loops
a <- vector() #it is better to pre-allocate the size
pvect1 <- vector() #same comment as above
for (j in 1:ncol(DF)) {
for (i in 1:nrow(DF)) {
if (DF[i,j] <= 0.5) {
a[i]=1
} else {
a[i]=0
}
}
pvect1[j] <- sum(a)
}
Checking with the vectorized solution
identical(as.vector(v1), pvect1)
#[1] TRUE
Upvotes: 2