Reputation: 31
This is my first question in this forum and I have limited experience with R, so my apologies if the question is unclear in some way or if it is too basic.
I have a dataframe called values that consists of a sample number column, two factor variables (H and W), and several number columns (named as number intervals after a cut), as follows:
sample H W (12.95,13] (13,13.05] (13.05,13.1] (13.1,13.15]
130 bg d 0 0 0 0
131 bg d 0 0 0 0
132 bg d 0 0 0 0
133 x i 0 0 0 0
134 x i 0 0 0 0
135 x i 0 0 0 0
136 x i 0 0 0 0
137 x i 0 0 0 0
138 x i 0 0 0 0
139 x i 0 0 0 0
140 x i 0 0 0 0
141 x i 0 35947.65 0 0
142 x i 0 0 0 0
143 x i 0 0 0 0
144 x i 0 0 0 0
145 x i 0 0 0 73709.67
146 x i 0 0 0 0
147 x i 21756.63 0 32362.41 0
148 x i 0 0 0 0
149 x i 0 0 0 0
150 x i 0 0 0 0
151 x i 0 0 0 0
152 x c 0 0 0 0
153 x c 0 0 0 0
154 x c 0 0 0 0
155 x c 0 0 0 32578.03
I need to count how many rows in each of the number columns for each treatment combination and sample number have values greater than 0. I have tried the aggregate, count and sum functions but have had no success so far.
Could someone help me out with this?
Thank you!
Upvotes: 3
Views: 625
Reputation: 3634
With data table (and assuming df
is your data frame):
library(data.table)
setDT(df)[`colname`>0, .N, by=list(H, W, sample)]
or
setDT(df)[`colname`>0, .N, by=list(H, W)]
if you don't care about sample
.
Where you should replace colname
with the name of the particular column you're looking at. This would be easier for me to check if you provided a reproducible example.
Upvotes: 2
Reputation: 7190
Maybe I didn't get it (my solution seems to be very simple), but I just apply on rows the sum of where columns are all different from 0. The output is a numeric vector with length equal to the number of row of you data where:
1 means there is at least one column with a value greater than 0 and so on...
apply(!df[, 4:7] == 0, 1, sum)
[1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 2 0 0 0 0 0 0 0 1
Upvotes: 0
Reputation: 19005
#replicable example
set.seed(123)
values <- data.frame(sample=1:100,
a=rep(1,100),
b=rep(c(1,2),50),
v1=rbinom(100,1,.1) * runif(100),
v2=rbinom(100,1,.1) * runif(100),
v3=rbinom(100,1,.1) * runif(100)
)
aggregate(cbind(v1, v2, v3) ~ a + b, # apply fcn to LHS grouped by RHS
data=values,
FUN=function(x) sum(x>0) # sum of TRUE v>0's is count of v's greater than 0
)
# a b v1 v2 v3
# 1 1 1 4 4 7
# 2 1 2 3 6 2
Upvotes: 0
Reputation: 1503
An imperfect solution using plyr
(I am pretty sure dplyr
package can do even better job, but I am less familiar with it)
The downside is that sums have to be calculated for each column separately - if there are 3 or 4 of them, it is OK, but for 100 intervals that wouldn't be viable.
##Generate fake data with 3 samples, 2 factors 3 levels each
##and 3 observations per combination
df <- expand.grid(sample = letters[1:3],
f1 = paste0('x', 1:3),
f2 = paste0('y', 1:3))
df <- rbind(df, df, df)
nums <- matrix(rnorm(4*nrow(df)), ncol = 4)
colnames(nums) <- paste0('val_', 1:4)
nums[nums < 1] <- 0
df <- cbind(df, nums)
##Summarize
require(plyr)
ddply(df, .(sample, f1, f2), summarize,
sum_1 = sum(val_1 > 0),
sum_2 = sum(val_2 > 0))
Upvotes: 0