Reshaping long from long

Question

I am trying to reshape a dataset in Stata to a long format.

However, the original dataset is already in a long format with multiple observations for each given individual. Each row contains all tests taken in a given year by a given student.

What I want is each line to contain data from a specific test in a given year.

For example, from this dataset:

  +---------------------------------------------------------------------------+
  | student_id | year | score_math | date_math | score_english | date_english |
  |------------+------+------------+-----------+---------------+--------------|
  |        111 | 2011 |          . |           |             . |              |
  |        111 | 2013 |        259 |       apr |           250 |          apr |
  |        222 | 2012 |        645 |       mar |           645 |          mar |
  |        222 | 2014 |        640 |       dec |             . |              |
  |        333 | 2016 |          . |           |           358 |          apr |
  |        333 | 2017 |          . |           |           299 |          jan |
  +---------------------------------------------------------------------------+

I would like to get the one below:

  +----------------------------------------------+
  | student_id | year | test_name | score | date |
  |------------+------+-----------+-------+------|
  |        111 | 2011 |   english |     . |      |
  |        111 | 2011 |      math |     . |      |
  |        111 | 2013 |   english |   250 |  apr |
  |        111 | 2013 |      math |   259 |  apr |
  |        222 | 2012 |   english |   645 |  mar |
  |        222 | 2012 |      math |   645 |  mar |
  |        222 | 2014 |   english |     . |      |
  |        222 | 2014 |      math |   640 |  dec |
  |        333 | 2016 |   english |   358 |  apr |
  |        333 | 2016 |      math |     . |      |
  |        333 | 2017 |   english |   299 |  jan |
  |        333 | 2017 |      math |     . |      |
  +----------------------------------------------+

I have tried running the following command but it does not work because the data is long:

reshape long score* date*, i(student_id) j(test_name)

What I get is the following message:

variable test_name contains all missing values

So I tried instead:

reshape wide score* date*, i(student_id) j(year)

Next, to 'pretend' my data was long (so that I could be able to reshape long) I used:

egen new_id = group(student_id year)

After having a unique new_id for each observation I then tried:

reshape long score* date*, i(new_id) j(test_name)

This did not work either.

Any suggestions on how I could get the desired output?

Reshaping long from long

Answers (1)

Related Questions