Dylan
Dylan

Reputation: 21

spss counting first consecutive zeros across variables

I want to count the days when a subject did not receive treatment (a "0" in my file. If a subject did receive treatment it is denoted with "1". Subject can get multiple courses of treatments and I would like to count the days between the first and second treatment. I am not (yet) interested in the time between the second and third treatment. Basically my spss file looks like this:

id day1 day2 day3 day4 day28
A--- 1-----0-----0----1------0
B--- 1---- 0-----1----0------1
C---etc

I am only interested in the first series of zeros. The output I hope to get is:

id first_series_zero
A 2
B 1
C ...

Can anyone help my out, here. Obviously, just counting all the zeros isn't going to work, because there might be multiple sets of zeroes in one row.

Cheers, Dylan

Upvotes: 2

Views: 240

Answers (1)

Andy W
Andy W

Reputation: 5089

Here is one pretty general approach that will allow you to calculate the times between all of the different treatments. First I create a vector that stores the locations of all of the treatments, Loc1 TO Loc5 (using day1 to day5 as an example).

DATA LIST FREE / day1 day2 day3 day4 day5.
BEGIN DATA
1 0 0 1 0
1 0 1 0 1
END DATA.

VECTOR day = day1 TO day5.
VECTOR Loc(5,F2.0).
COMPUTE #id = 1.
LOOP #i = 1 TO 5.
  DO IF day(#i) = 1.
    COMPUTE Loc(#id) = #i.
    COMPUTE #id = #id + 1.
  END IF.
END LOOP.

Now if you run this transformation, the Loc vector will look like this for this example data.

Loc1 Loc2 Loc3 Loc4 Loc5 

  1    4    .    .    . 
  1    3    5    .    . 

Now to calculate the difference for the first series is as simple as:

COMPUTE first_series_zero = Loc2 - Loc1 - 1.

This will return missing if there is never a second (or first) treatment, and is not dependent on day1 always being the first day of the treatment. Now to calculate the difference between all of the treatments is quite simple, and here is a DO REPEAT approach.

VECTOR DifS(4,F2.0).
DO REPEAT F = Loc1 TO Loc4 /B = Loc2 TO Loc5 /D = DifS1 TO DifS4.
  COMPUTE D = B - F - 1.
END REPEAT.

And so DifS1 would be the zeroes between the 1st and 2nd treatment, DifS2 would be the zeroes between the 2nd and 3rd treatment etc. (Both this do repeat and the first loop could be made more efficient with a loop that only goes over valid/possible values.)

Upvotes: 1

Related Questions