user2807119
user2807119

Reputation: 353

counting zeros in columns in data frame in R and express as percentage

I want to count number of zeros in each column in a R data frame and express it as a percentage. This percentage should be added to last row of the original data frame? example

x <- c(0, 4, 6, 0, 10)
y <- c(3, 0, 9, 12, 15)
z <- c(3, 6, 9, 0, 15)

data_a <- cbind(x,y,z)

want to see the zeros in each column and express as percentage

Thanks

Upvotes: 6

Views: 33063

Answers (4)

King_Aardvark
King_Aardvark

Reputation: 34

This is probably inelegant, but this is how I went about it when my columns had NAs:

#Returns the number of zeroes in a column
numZero <- colSums(vars == 0, na.rm = T)

#Returns the number of non-NA entries in each column
numNA <- colSums(is.na(vars))

#Returns total sample size
numSamp <- rep(nrow(vars), ncol(vars))

#Combine the three
varCheck <- as.data.frame(cbind(numZero, numNA, numSamp))

#Number of observations for that variable
varCheck$numTotal <- varCheck$numSamp - varCheck$numNA

#Percentage zero
varCheck$pctZero <- varCheck$numZero / varCheck$numTotal

#Check which have lower than 1%
varCheck[which(varCheck$pctZero > 0.99),]

Upvotes: 0

Chirayu Chamoli
Chirayu Chamoli

Reputation: 2076

Here is one more method using lapply, this would work for a data frame though.

lapply(data_a, function(x){ length(which(x==0))/length(x)})

Upvotes: 8

Jilber Urbina
Jilber Urbina

Reputation: 61154

A combination of prop.table and some *apply work can give you the same answer as @Roland's

> prop <- apply(data_a, 2, function(x) prop.table(table(x))*100)
> rbind(data_a, sapply(prop, "[", 1))
      x  y  z
[1,]  0  3  3
[2,]  4  0  6
[3,]  6  9  9
[4,]  0 12  0
[5,] 10 15 15
[6,] 40 20 20

Upvotes: 2

Roland
Roland

Reputation: 132706

x <- c(0, 4, 6, 0, 10)
y <- c(3, 0, 9, 12, 15)
z <- c(3, 6, 9, 0, 15)

data_a <- cbind(x,y,z)
#This is a matrix not a data.frame.    

res <- colSums(data_a==0)/nrow(data_a)*100

If you must, rbind to the matrix (usually not really a good idea).

rbind(data_a, res)
#      x  y  z
#      0  3  3
#      4  0  6
#      6  9  9
#      0 12  0
#     10 15 15
# res 40 20 20

Upvotes: 14

Related Questions