Reputation: 915
I have the following (sorted) variable:
35
35
37
37
37
40
I want to create a new variable which will increment by one when a new number comes up in the original variable.
For example:
35 1
35 1
37 2
37 2
37 2
40 3
I thought about using the by
or bysort
commands but none of them seems to solve the problem. This looks like something many people need, but I couldn't find an answer.
Upvotes: 0
Views: 1559
Reputation:
@Nick beat me to it by a couple of minutes but here's another -cleaner- way of doing this:
clear
input foo
35
35
37
37
37
40
end
egen counter = group(foo)
list
+---------------+
| foo counter |
|---------------|
1. | 35 1 |
2. | 35 1 |
3. | 37 2 |
4. | 37 2 |
5. | 37 2 |
|---------------|
6. | 40 3 |
+---------------+
This approach uses the egen
command and its associated group()
function.
There are also a couple of options for this function, with missing
being perhaps the most useful.
From the command's help file:
"...missing indicates that missing values in varlist (either . or "") are to be treated like any other value when assigning groups, instead of as missing values being assigned to the group missing..."
clear
input foo
35
35
.
37
37
37
40
.
end
egen counter = group(foo), missing
sort foo
list
+---------------+
| foo counter |
|---------------|
1. | 35 1 |
2. | 35 1 |
3. | 37 2 |
4. | 37 2 |
5. | 37 2 |
|---------------|
6. | 40 3 |
7. | . 4 |
8. | . 4 |
+---------------+
Instead of:
drop counter
egen counter = group(foo)
sort foo
list
+---------------+
| foo counter |
|---------------|
1. | 35 1 |
2. | 35 1 |
3. | 37 2 |
4. | 37 2 |
5. | 37 2 |
|---------------|
6. | 40 3 |
7. | . . |
8. | . . |
+---------------+
Another option is label
:
"... The label option returns integers from 1 up according to the distinct groups of varlist in sorted order. The integers are labeled with the values of varlist or the value labels, if they exist..."
Using the example without the missing values:
egen counter = group(foo), label
list
+---------------+
| foo counter |
|---------------|
1. | 35 35 |
2. | 35 35 |
3. | 37 37 |
4. | 37 37 |
5. | 37 37 |
|---------------|
6. | 40 40 |
+---------------+
Upvotes: 1
Reputation: 37208
You are just counting how often a value differs from the previous value. This works also for observation 1 as any reference to a value for observation 0 is returned as missing, so in your example 35 is not equal to missing.
clear
input x
35
35
37
37
37
40
end
gen new = sum(x != x[_n-1])
list, sepby(new)
+----------+
| x new |
|----------|
1. | 35 1 |
2. | 35 1 |
|----------|
3. | 37 2 |
4. | 37 2 |
5. | 37 2 |
|----------|
6. | 40 3 |
+----------+
by
would be pertinent if you had blocks of observations to be treated separately. One underlying principle here is that true or false comparisons (here, whether two values are unequal) are evaluated as 1 if true and 0 is false.
Upvotes: 2