Reputation: 23
I am trying to reshape a dataset in Stata to a long format.
However, the original dataset is already in a long format with multiple observations for each given individual. Each row contains all tests taken in a given year by a given student.
What I want is each line to contain data from a specific test in a given year.
For example, from this dataset:
+---------------------------------------------------------------------------+
| student_id | year | score_math | date_math | score_english | date_english |
|------------+------+------------+-----------+---------------+--------------|
| 111 | 2011 | . | | . | |
| 111 | 2013 | 259 | apr | 250 | apr |
| 222 | 2012 | 645 | mar | 645 | mar |
| 222 | 2014 | 640 | dec | . | |
| 333 | 2016 | . | | 358 | apr |
| 333 | 2017 | . | | 299 | jan |
+---------------------------------------------------------------------------+
I would like to get the one below:
+----------------------------------------------+
| student_id | year | test_name | score | date |
|------------+------+-----------+-------+------|
| 111 | 2011 | english | . | |
| 111 | 2011 | math | . | |
| 111 | 2013 | english | 250 | apr |
| 111 | 2013 | math | 259 | apr |
| 222 | 2012 | english | 645 | mar |
| 222 | 2012 | math | 645 | mar |
| 222 | 2014 | english | . | |
| 222 | 2014 | math | 640 | dec |
| 333 | 2016 | english | 358 | apr |
| 333 | 2016 | math | . | |
| 333 | 2017 | english | 299 | jan |
| 333 | 2017 | math | . | |
+----------------------------------------------+
I have tried running the following command but it does not work because the data is long:
reshape long score* date*, i(student_id) j(test_name)
What I get is the following message:
variable test_name contains all missing values
So I tried instead:
reshape wide score* date*, i(student_id) j(year)
Next, to 'pretend' my data was long (so that I could be able to reshape long) I used:
egen new_id = group(student_id year)
After having a unique new_id
for each observation I then tried:
reshape long score* date*, i(new_id) j(test_name)
This did not work either.
Any suggestions on how I could get the desired output?
Upvotes: 1
Views: 430
Reputation:
The following produces the desired output:
clear
input student_id year score_math str3 date_math score_english str3 date_english
111 2011 . " " . " "
111 2013 259 "apr" 250 "apr"
222 2012 645 "mar" 645 "mar"
222 2014 640 "dec" . " "
333 2016 . " " 358 "apr"
333 2017 . " " 299 "jan"
end
reshape long score_ date_, i(student_id year) j(test_name) string
rename *_ *
list, sepby(student_id) abbreviate(30)
+----------------------------------------------+
| student_id year test_name score date |
|----------------------------------------------|
1. | 111 2011 english . |
2. | 111 2011 math . |
3. | 111 2013 english 250 apr |
4. | 111 2013 math 259 apr |
|----------------------------------------------|
5. | 222 2012 english 645 mar |
6. | 222 2012 math 645 mar |
7. | 222 2014 english . |
8. | 222 2014 math 640 dec |
|----------------------------------------------|
9. | 333 2016 english 358 apr |
10. | 333 2016 math . |
11. | 333 2017 english 299 jan |
12. | 333 2017 math . |
+----------------------------------------------+
Upvotes: 1