Reputation: 81
I have some data tables that consist of several columns and many thousands of rows. My data looks something like:
iteration V1 V2 V3 V4
1 -2 3 -4 1
2 -2 3 -3 4
3 -2 3 7 -8
4 -2 3 -4 2
5 -2 3 -4 -3
I have been trying to figure out how to calculate counts of positive values in each column, and the proportion of positive counts to all counts in a column.
This seems fairly simple but I can't figure out how to output a data.table that has counts by column in it.
I can do this by combining a bunch of the following statements, but there has to be a better way- any advice for a tired mind?
nrow(dat[v2>=0])
Upvotes: 0
Views: 1089
Reputation: 656
Assuming your dataframe is called df
:
df <- data.frame('V1'=c(-2, -2, -2, -2, -2), 'V2'=c(3, 3, 3, 3, 3), 'V3'=c(-4, -3, 7, -4, -4), 'V4'=c(1, 4, -8, 2, -3))
you could start by defining the number of rows as:
nRows <- dim(df)[1]
Then, you can define an auxiliary function as such:
calcStats <- function(x) {
pos <- sum(df[, x] > 0)
c("number of positives" = pos, "proportion of positives" = pos / nRows)
}
and get the result with:
result <- as.data.frame(Map(calcStats, colnames(df)))
V1 V2 V3 V4
number of positives 0 5 1.0 3.0
proportion of positives 0 1 0.2 0.6
Upvotes: 1