Reputation: 111
I have a panel dataset and want to create groups from the data.
egen AUX = cut(variable), group(5)
This will create groups with (almost) the same number of units in each. What I would like now is the have groups such that in every group, the total of another variable is the same. For example, I want to group households such that every bin has the same total income.
How to set up such a command?
Upvotes: 0
Views: 1135
Reputation: 37278
No data example here, and it's not clear how the panel structure enters. For example, do you want to pool households for all years (?), or take years separately?
Either way, the technique is to split according to fractions of the cumulative sum. A detail is that identical values should be assigned to the same bin.
sysuse auto, clear
bysort foreign (price) : gen runningsum = sum(price)
* same values belong together
bysort foreign price (runningsum) : replace runningsum = runningsum[_N]
by foreign : gen quintile = ceil(5 * runningsum/runningsum[_N])
bysort foreign quintile : egen qtotal = total(price)
list price qtotal quintile if foreign, sepby(quintile)
+----------------------------+
| price qtotal quintile |
|----------------------------|
53. | 3,748 24231 1 |
54. | 3,798 24231 1 |
55. | 3,895 24231 1 |
56. | 3,995 24231 1 |
57. | 4,296 24231 1 |
58. | 4,499 24231 1 |
|----------------------------|
59. | 4,589 31280 2 |
60. | 4,697 31280 2 |
61. | 5,079 31280 2 |
62. | 5,397 31280 2 |
63. | 5,719 31280 2 |
64. | 5,799 31280 2 |
|----------------------------|
65. | 5,899 25273 3 |
66. | 6,229 25273 3 |
67. | 6,295 25273 3 |
68. | 6,850 25273 3 |
|----------------------------|
69. | 7,140 24959 4 |
70. | 8,129 24959 4 |
71. | 9,690 24959 4 |
|----------------------------|
72. | 9,735 34720 5 |
73. | 11,995 34720 5 |
74. | 12,990 34720 5 |
+----------------------------+
Upvotes: 1