Reputation: 705
I have problems in reshaping data from wide to long format:
I was hoping that this Stata FAQ could help me, but unfortunately this does not work properly (see following code snippets).
So I do have data that looks like the following example:
clear
input str45 Year
"2010"
"2011"
"2012"
"2014"
end
input str45 A101Meas0010
"1.50"
"1.70"
"1.71"
"1.71"
input str45 A101Meas0020
"50"
"60"
"65"
"64"
input str45 A101Meas0020A
"51"
"62"
"64"
"68"
input str45 FE123Meas0010
"1.60"
"1.75"
"1.92"
"1.94"
input str45 FE123Meas0020
"60"
"72"
"88"
"92"
list
+-------------------------------------------------------------+
| Year A10~0010 A10~0020 A1~0020A FE1~0010 FE1~0020 |
|-------------------------------------------------------------|
1. | 2010 1.50 50 51 1.60 60 |
2. | 2011 1.70 60 62 1.75 72 |
3. | 2012 1.71 65 64 1.92 88 |
4. | 2014 1.71 64 68 1.94 92 |
+-------------------------------------------------------------+
The final table I want to achieve would look something like this:
+--------------------------------------------------+
| Year ID Meas0010 Meas0020 Meas0020A |
|--------------------------------------------------|
1. | 2010 A101 1.50 50 . |
2. | 2010 FE123 1.60 51 60 |
3. | 2011 A101 1.70 60 . |
4. | 2011 FE123 1.75 62 72 |
5. | 2012 A101 1.71 65 . |
6. | 2012 FE123 1.92 64 88 |
7. | 2014 A101 1.71 64 . |
8. | 2014 FE123 1.94 68 92 |
+--------------------------------------------------+
I tried following code snippet close to the example from the Stata FAQ, but this throws an error:
unab vars : *Meas*
local stubs : subinstr local vars "Meas0010" "", all
local stubs : subinstr local stubs "Meas0020" "", all
local stubs : subinstr local stubs "Meas0020A" "", all
reshape long "`stubs'", i(Year) j(Measurement) string
(note: j = Meas0010 Meas0020 Meas0020A)
(note: A101AMeas0010 not found)
variable A101Meas0010 not found
r(111);
Any ideas how to reshape this? I never had to reshape such an odd structure before.
Additional Question: In the example above I did have to specify the Measurement-Names Meas0010
, Meas0020
and Meas0020A
. Is it possible to automate this as well? All measurement names start with the keyword Meas
, so the variable names are always of the structure _ID+Meas
Name, e.g. A101Meas0020A
stands for ID A101
and Measurement Meas0020A
.
The annoying thing is: I do know how to do this in MATLAB, but I am forced to use Stata here.
Upvotes: 0
Views: 1802
Reputation: 37288
Your variable name structure is a little awkward, but there is a syntax to match. It's better covered in the help for reshape
, and is only barely mentioned in the FAQ you cite (which I wrote, so I can be emphatic that it's intended as a supplement to the help, not the first line of documentation).
Your example yields to
clear
input str4 (Year A101Meas0010 A101Meas0020 A101Meas0020A FE123Meas0010 FE123Meas0020)
"2010" "1.50" "50" "51" "1.60" "50"
"2011" "1.70" "60" "62" "1.75" "60"
"2012" "1.71" "65" "64" "1.92" "65"
"2014" "1.71" "64" "68" "1.94" "64"
end
reshape long @Meas0010 @Meas0020 @Meas0020A, i(Year) j(ID) string
destring, replace
sort Year ID
list, sepby(Year)
+-----------------------------------------------+
| Year ID Meas0010 Meas0020 Me~0020A |
|-----------------------------------------------|
1. | 2010 A101 1.5 50 51 |
2. | 2010 FE123 1.6 50 . |
|-----------------------------------------------|
3. | 2011 A101 1.7 60 62 |
4. | 2011 FE123 1.75 60 . |
|-----------------------------------------------|
5. | 2012 A101 1.71 65 64 |
6. | 2012 FE123 1.92 65 . |
|-----------------------------------------------|
7. | 2014 A101 1.71 64 68 |
8. | 2014 FE123 1.94 64 . |
+-----------------------------------------------+
It seems bizarre that your example enters everything as string: note the destring
in my code.
Without access to your dataset, I'd say that you should be able to find the more general syntax without automation. You know that there are at most about 10 measurements in the fullest case. In any event you are already showing the syntax tricks needed to remove strings you don't need.
Upvotes: 0