hkk
hkk

Reputation: 13

Stata help: filling in missing variables by adding existing variables

Right now my data looks something like this. What I want to do is to fill in the missing values in logrd with logrd[_n+1]-avgrdgr[_n+1].

   age avgrdgr logrd 
    -37    0.1    .
          ...
    -3    -0.2    .
    -2    -0.1    .
    -1     0.3    .
     0     0.4    .
     1     0.1    . 
     2     0.6    .
     3     0.5    1

So the result should look like this...

   age avgrdgr logrd 
    -37    0.1    0.3

          ...

    -3    -0.2    -0.8
    -2    -0.1    -0.9
    -1     0.3    -0.6
     0     0.4    -0.2
     1     0.1    -0.1 
     2     0.6    0.5
     3     0.5    1

I tried looping it by creating a code like this.

      foreach x of logrd & y of avgrdgr{
          if missing(`x'){
          bys cus: replace `x' = `x'[_n+1] - `y'[_n+1] 
             }
          }

This is my first time actually trying to create a loop all by myself and I am stuck.. please help me.

Upvotes: 0

Views: 82

Answers (3)

Nick Cox
Nick Cox

Reputation: 37183

That's a long way from legal foreach syntax. It's simple but crucial to note that whatever is not allowed in the foreach syntax diagram is forbidden. Further, note that the if statement doesn't work as you are expecting. There isn't a different result to the evaluation each time round the loop. if missing(whatever) as a command only ever means if missing(whatever[1]).

Unless your example simplifies away key details, there is just one loop, so this should suffice.

clear 
input  age avgrdgr logrd 
    -3    -0.2    .
    -2    -0.1    .
    -1     0.3    .
     0     0.4    .
     1     0.1    . 
     2     0.6    .
     3     0.5    1
end 

quietly forval age = 2(-1)-3 {
   replace logrd = logrd[_n+1] - avgrdgr[_n+1] if age == `age' 
} 

list, sep(0)  

     +-----------------------+
     | age   avgrdgr   logrd |
     |-----------------------|
  1. |  -3       -.2     -.8 |
  2. |  -2       -.1     -.9 |
  3. |  -1        .3     -.6 |
  4. |   0        .4     -.2 |
  5. |   1        .1     -.1 |
  6. |   2        .6      .5 |
  7. |   3        .5       1 |
     +-----------------------+

I toyed with reversing the order, as in @William Lisowski's answer, but this one worked too.

Upvotes: 0

user4690969
user4690969

Reputation:

Perhaps the following data, which includes an apparent identifier used in your sample code but not present in your sample data, looks something more like your data.

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(cus age) float(avgrdgr logrd)
1 -3 -.2  .
1 -2 -.1  .
1 -1  .3  .
1  0  .4  .
1  1  .1  .
1  2  .6  .
1  3  .5  1
2 -3 -.2  .
2 -2 -.1  .
2 -1  .3  .
2  0  .4  .
2  1  .1  .
2  2  .6  .
2  3  .5  0
end
generate negage = -age
bysort cus (negage): replace logrd = logrd[_n-1]-avgrdgr[_n-1] if missing(logrd)
drop negage
sort cus age

Which results in

. list, sepby(cus)

     +-----------------------------+
     | cus   age   avgrdgr   logrd |
     |-----------------------------|
  1. |   1    -3       -.2     -.8 |
  2. |   1    -2       -.1     -.9 |
  3. |   1    -1        .3     -.6 |
  4. |   1     0        .4     -.2 |
  5. |   1     1        .1     -.1 |
  6. |   1     2        .6      .5 |
  7. |   1     3        .5       1 |
     |-----------------------------|
  8. |   2    -3       -.2    -1.8 |
  9. |   2    -2       -.1    -1.9 |
 10. |   2    -1        .3    -1.6 |
 11. |   2     0        .4    -1.2 |
 12. |   2     1        .1    -1.1 |
 13. |   2     2        .6     -.5 |
 14. |   2     3        .5       0 |
     +-----------------------------+

Upvotes: 0

user4690969
user4690969

Reputation:

You don't need a loop. But since Stata works through your data from first to last observation, you need to temporarily reverse your data so that the later observations come before the earlier observations that you want to fill in. Here's some code that's something like what you can use.

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte age float avgrdgr float logrd
-3 -.2 .
-2 -.1 .
-1  .3 .
 0  .4 .
 1  .1 .
 2  .6 .
 3  .5 1
end
gsort -age
replace logrd = logrd[_n-1]-avgrdgr[_n-1] if missing(logrd)
sort age

Which results in

. list, clean

       age   avgrdgr   logrd  
  1.    -3       -.2     -.8  
  2.    -2       -.1     -.9  
  3.    -1        .3     -.6  
  4.     0        .4     -.2  
  5.     1        .1     -.1  
  6.     2        .6      .5  
  7.     3        .5       1  

Upvotes: 1

Related Questions