FooBar
FooBar

Reputation: 16518

Stata: Running sum with missing values

I would like to get the running sum over j, e.g. for row _n and j=3 I would like to get x[_n] + x[_n+1] + x[_n+2].

I thought to do the following trick:

    by grouping: gen foo = sum(log(x))
    by grouping: replace foo = foo - foo[_n - `y'] if _n > `y'

However, this is not robust to missing values. sum() generates 0 for a missing value, while I need the cumulative sums to be . if at least one of the values to be summed over is .. How would I achieve that?

I need it to be working for large j too, so manually adding these up is not feasible.

Upvotes: 1

Views: 2228

Answers (2)

Nick Cox
Nick Cox

Reputation: 37278

Stealing Roberto's reproducible example (given none such in the post) another way to do this is using tsegen (SSC), which must be installed with ssc inst tsegen. It is essential to tsset or xtset the dataset. tsegen has wired into the syntax a way of stipulating the minimum number of non-missing arguments, here all of them.

clear
set more off

input ///
id myvar
1 3 
1 4 
1 . 
1 8 
1 7
1 2
1 .
1 8
2 3 
2 4 
2 . 
2 8 
2 7
2 2
2 .
2 8
end

sort id, stable
by id: gen t = _n 
tsset id t 

tsegen run = rowtotal(F(0/2).myvar, 3) 

list , sepby(id) 

     +----------------------+
     | id   myvar   t   run |
     |----------------------|
  1. |  1       3   1     . |
  2. |  1       4   2     . |
  3. |  1       .   3     . |
  4. |  1       8   4    17 |
  5. |  1       7   5     . |
  6. |  1       2   6     . |
  7. |  1       .   7     . |
  8. |  1       8   8     . |
     |----------------------|
  9. |  2       3   1     . |
 10. |  2       4   2     . |
 11. |  2       .   3     . |
 12. |  2       8   4    17 |
 13. |  2       7   5     . |
 14. |  2       2   6     . |
 15. |  2       .   7     . |
 16. |  2       8   8     . |
     +----------------------+

Upvotes: 0

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

There are several ways to interpret and work it out:

clear
set more off

input ///
id myvar
1 3 
1 4 
1 . 
1 8 
1 7
1 2
1 .
1 8
2 3 
2 4 
2 . 
2 8 
2 7
2 2
2 .
2 8
end

// this?
bysort id: gen s = cond(!missing(myvar), sum(myvar), .)

// or this? (or something else?)
clonevar s2 = s
bysort id: replace s2 = . if missing(s2[_n-1]) & _n > 1

list, sepby(id)

It's not clear to me if you want any of these.

Upvotes: 2

Related Questions