user42459
user42459

Reputation: 915

Creating a variable that increments by one if new value found in another

I have the following (sorted) variable:

35 
35     
37 
37 
37 
40 

I want to create a new variable which will increment by one when a new number comes up in the original variable.

For example:

35    1
35    1    
37    2
37    2
37    2
40    3

I thought about using the by or bysort commands but none of them seems to solve the problem. This looks like something many people need, but I couldn't find an answer.

Upvotes: 0

Views: 1559

Answers (2)

user8682794
user8682794

Reputation:

@Nick beat me to it by a couple of minutes but here's another -cleaner- way of doing this:

clear

input foo
35 
35     
37 
37 
37 
40
end

egen counter = group(foo)

list

     +---------------+
     | foo   counter |
     |---------------|
  1. |  35         1 |
  2. |  35         1 |
  3. |  37         2 |
  4. |  37         2 |
  5. |  37         2 |
     |---------------|
  6. |  40         3 |
     +---------------+

This approach uses the egen command and its associated group() function.

There are also a couple of options for this function, with missing being perhaps the most useful.

From the command's help file:

"...missing indicates that missing values in varlist (either . or "") are to be treated like any other value when assigning groups, instead of as missing values being assigned to the group missing..."

clear

input foo
35 
35
.     
37 
37 
37 
40
.
end

egen counter = group(foo), missing

sort foo
list

     +---------------+
     | foo   counter |
     |---------------|
  1. |  35         1 |
  2. |  35         1 |
  3. |  37         2 |
  4. |  37         2 |
  5. |  37         2 |
     |---------------|
  6. |  40         3 |
  7. |   .         4 |
  8. |   .         4 |
     +---------------+

Instead of:

drop counter
egen counter = group(foo)

sort foo
list

     +---------------+
     | foo   counter |
     |---------------|
  1. |  35         1 |
  2. |  35         1 |
  3. |  37         2 |
  4. |  37         2 |
  5. |  37         2 |
     |---------------|
  6. |  40         3 |
  7. |   .         . |
  8. |   .         . |
     +---------------+

Another option is label:

"... The label option returns integers from 1 up according to the distinct groups of varlist in sorted order. The integers are labeled with the values of varlist or the value labels, if they exist..."

Using the example without the missing values:

egen counter = group(foo), label

list

     +---------------+
     | foo   counter |
     |---------------|
  1. |  35        35 |
  2. |  35        35 |
  3. |  37        37 |
  4. |  37        37 |
  5. |  37        37 |
     |---------------|
  6. |  40        40 |
     +---------------+

Upvotes: 1

Nick Cox
Nick Cox

Reputation: 37208

You are just counting how often a value differs from the previous value. This works also for observation 1 as any reference to a value for observation 0 is returned as missing, so in your example 35 is not equal to missing.

clear
input x 
35 
35     
37 
37 
37 
40 
end 

gen new = sum(x != x[_n-1]) 
list, sepby(new) 

     +----------+
     |  x   new |
     |----------|
  1. | 35     1 |
  2. | 35     1 |
     |----------|
  3. | 37     2 |
  4. | 37     2 |
  5. | 37     2 |
     |----------|
  6. | 40     3 |
     +----------+

by would be pertinent if you had blocks of observations to be treated separately. One underlying principle here is that true or false comparisons (here, whether two values are unequal) are evaluated as 1 if true and 0 is false.

Upvotes: 2

Related Questions