user3051065
user3051065

Reputation: 411

R: Calculate Variance on all Values after the first Non-Zero

This is a follow-on from an earlier question: R: Find the Variance of all Non-Zero Elements in Each Row, but the problem is explained in full below.

I have a dataframe d like this:

Data <- data.frame(ID = 1:4, Value1=c(0,12,0,0), Value2=c(12,0,10,0), Value3=c(21,0,0,8), Value4=c(18,5,17,29))

ID  Value1  Value2  Value3  Value4
1   0       12      21      18               
2   12      0       0       5
3   0       10      0       17
4   0       0       8       29

What I would like to do is calculate the variance for each person (ID), based on every value in the row including and after the first non-zero value.

So for instance, in this example the variance for ID 1 would be var(c(12, 21, 18)), for ID 2 it would var(c(12, 0, 0, 5)), for ID 3 the var would be var(c(10, 0, 17)) and for ID 4 it would be var(c(8, 29)).

How would I go about this? I currently have the following code, which removes all zeros, as opposed to just those before a non-zero value:

varfunc <- function(x) var(x[x > 0])
variances = apply(d[,c(-1)], 1, varfunc)

Upvotes: 2

Views: 625

Answers (2)

rbatt
rbatt

Reputation: 4807

Apply a variance function row-wise (using apply, 1st margin is row); that variance function should subset the values in the row by taking the first value which() is not 0 and all subsequent values (which(x!=0)[1]:length(x) provides the indices to use for the subset).

Here is your solution:

Data <- data.frame(ID = 1:5, Value1=c(0,12,0,0,0), Value2=c(12,0,10,0,0), Value3=c(21,0,0,8,0), Value4=c(18,5,17,29,0))

var.after0 <- function(x){
    x.vals <- as.numeric(x[-1]) # need to convert b/c x can be a data.frame, not just matrix
    if(all(x.vals==0)){
        return(0) # just return a 0 here so we don't create an empty subset later
    }else{
        n.vals <- length(x.vals) # how many values?
        x.vals.not0 <- which(x.vals!=0) # positions (indices) of values that are not 0
        first.not0 <- x.vals.not0[1] # the position of the first non-0 value
        x.vals.after0 <- x.vals[first.not0:n.vals] # the non-0 values after first 0
        var(x.vals.after0) # variance of the non-0 values after first 0
    }
}

apply(Data, 1, var.after0)

which returns:

[1]  21.00  32.25  73.00 220.50   0.00

Note: I have added an extra row to your data set which contains all 0's. This is an important case that the variance function should be able to handle in order to be robust. Thought such an adjustment would come in handy. Feel free to copy it into your original question if you agree.

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92300

I can't think of way in order to avoid apply here, but here's a possible solution

varfunc <- function(x) var(x[which(x != 0)[1L]:length(x)])
apply(d[-1], 1, varfunc)
## [1]  21.00  32.25  73.00 220.50

Basically we are sub-setting each row by the first non zero value - until number of columns and calculating the variance.

We can easily validate results according to your rules

var(c(12, 21, 18))
## [1] 21
var(c(12, 0, 0, 5))
## [1] 32.25
var(c(10, 0, 17))
## [1] 73
var(c(8, 29))
## [1] 220.5

Upvotes: 3

Related Questions