Reputation: 11
I have a list of places with population, much like in the example data below:
sysuse census, clear
How can I combine (sum) only two observations to create a new observation, while maintaining the rest of the data?
In the below example I would like to combine Alabama and Alaska to create a new observation called 'Alabama & Alaska' with the sum of their populations.
With the new observation, the previous records will need to be deleted.
+----------------------------+
| state pop |
|----------------------------|
1. | Alabama 3,893,888 |
2. | Alaska 401,851 |
3. | Arizona 2,718,215 |
4. | Arkansas 2,286,435 |
5. | California 23,667,902 |
+----------------------------+
+-----------------------------------+
| state pop |
|-----------------------------------|
1. | Alabama & Alaska 4,295,739 | <--Alabama & Alaska combined
2. | Arizona 2,718,215 | <--Retain other observations and variables
3. | Arkansas 2,286,435 |
4. | California 23,667,902 |
+-----------------------------------+
This is my original toy data example and its expected output:
PlaceName Population
Town 1 100
Town 2 200
Town 3 100
Town 4 100
PlaceName Population
Town 1 & Town 2 300
Town 3 100
Town 4 100
Upvotes: 0
Views: 1381
Reputation: 37208
This is how to do it with collapse
. As you ask, this combines two observations into one, and thus changes the dataset.
clear
input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end
replace PlaceName = "Towns 1 and 2" in 1/2
collapse (sum) Population , by(PlaceName)
list
+--------------------------+
| PlaceName Popula~n |
|--------------------------|
1. | Town 3 100 |
2. | Town 4 100 |
3. | Towns 1 and 2 300 |
+--------------------------+
Upvotes: 0
Reputation:
Using your original toy example, the following works for me:
clear
input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end
generate PlaceName2 = cond(_n == 1, PlaceName + " & " + PlaceName[_n+1], PlaceName)
generate Population2 = cond(_n == 1, Population[_n+1] + Population, Population)
replace PlaceName2 = "" in 2
replace Population2 = . in 2
gsort - Population2
list, abbreviate(12)
+--------------------------------------------------------+
| PlaceName Population PlaceName2 Population2 |
|--------------------------------------------------------|
1. | Town 1 100 Town 1 & Town 2 300 |
2. | Town 4 100 Town 4 100 |
3. | Town 3 100 Town 3 100 |
4. | Town 2 200 . |
+--------------------------------------------------------+
Upvotes: 1