Reputation: 411
This is a follow-on from an earlier question: R: Find the Variance of all Non-Zero Elements in Each Row, but the problem is explained in full below.
I have a dataframe d like this:
Data <- data.frame(ID = 1:4, Value1=c(0,12,0,0), Value2=c(12,0,10,0), Value3=c(21,0,0,8), Value4=c(18,5,17,29))
ID Value1 Value2 Value3 Value4
1 0 12 21 18
2 12 0 0 5
3 0 10 0 17
4 0 0 8 29
What I would like to do is calculate the variance for each person (ID), based on every value in the row including and after the first non-zero value.
So for instance, in this example the variance for ID 1 would be var(c(12, 21, 18))
,
for ID 2 it would var(c(12, 0, 0, 5))
, for ID 3 the var would be var(c(10, 0, 17))
and for ID 4 it would be var(c(8, 29))
.
How would I go about this? I currently have the following code, which removes all zeros, as opposed to just those before a non-zero value:
varfunc <- function(x) var(x[x > 0])
variances = apply(d[,c(-1)], 1, varfunc)
Upvotes: 2
Views: 625
Reputation: 4807
Apply a variance function row-wise (using apply
, 1st margin is row); that variance function should subset the values in the row by taking the first value which()
is not 0
and all subsequent values (which(x!=0)[1]:length(x)
provides the indices to use for the subset).
Here is your solution:
Data <- data.frame(ID = 1:5, Value1=c(0,12,0,0,0), Value2=c(12,0,10,0,0), Value3=c(21,0,0,8,0), Value4=c(18,5,17,29,0))
var.after0 <- function(x){
x.vals <- as.numeric(x[-1]) # need to convert b/c x can be a data.frame, not just matrix
if(all(x.vals==0)){
return(0) # just return a 0 here so we don't create an empty subset later
}else{
n.vals <- length(x.vals) # how many values?
x.vals.not0 <- which(x.vals!=0) # positions (indices) of values that are not 0
first.not0 <- x.vals.not0[1] # the position of the first non-0 value
x.vals.after0 <- x.vals[first.not0:n.vals] # the non-0 values after first 0
var(x.vals.after0) # variance of the non-0 values after first 0
}
}
apply(Data, 1, var.after0)
which returns:
[1] 21.00 32.25 73.00 220.50 0.00
Note: I have added an extra row to your data set which contains all 0's. This is an important case that the variance function should be able to handle in order to be robust. Thought such an adjustment would come in handy. Feel free to copy it into your original question if you agree.
Upvotes: 3
Reputation: 92300
I can't think of way in order to avoid apply
here, but here's a possible solution
varfunc <- function(x) var(x[which(x != 0)[1L]:length(x)])
apply(d[-1], 1, varfunc)
## [1] 21.00 32.25 73.00 220.50
Basically we are sub-setting each row by the first non zero value - until number of columns and calculating the variance.
We can easily validate results according to your rules
var(c(12, 21, 18))
## [1] 21
var(c(12, 0, 0, 5))
## [1] 32.25
var(c(10, 0, 17))
## [1] 73
var(c(8, 29))
## [1] 220.5
Upvotes: 3