Reputation: 2527
I received data with a string variable that looks something like:
var_name
25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30
01-APR-11: A25, B82, C65
04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82
12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54
27-OCT-15: A22, B95, C08
And so on. My goal is to split these strings up into different variable names. The variable names would be v1_date
, v1_A
, v1_B
, v1_C
, v2_date
, v2_A
, v2_B
, v2_C
, v3_date
, v3_A
, v3_B
, v3_C
.
I can use split var_name, p(";")
, rename to be v1
, v2
, and v3
, and then split
again to do this. But the problem is that I want v1
, v2
, and v3
to be in chronological order based on the date and the data is not currently arranged in that fashion. How can I make it so that the date of v1
comes before v2
and the date of v2
comes before the date of v3
? For example in the first observation, I want 25-DEC-99: A11, B14, C89
to be associated with v2
and 28-FEB-94: A27, B94, C30
to be associated with v1
.
Upvotes: 0
Views: 88
Reputation: 11102
The following gets you close, I believe. It uses both split
and reshape
.
clear
set more off
input ///
str100 myvar
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split myvar, p(;)
drop myvar
gen obs = _n
reshape long myvar, i(obs)
drop if missing(myvar)
split myvar, p(:)
drop myvar
gen myvar11 = date(myvar1, "DMY", 2020)
format %td myvar11
drop myvar1
rename (myvar11 myvar2) (mydate mycells)
order mydate, before(mycells)
bysort obs (mydate) : gen neworder = _n
drop _j
reshape wide mydate mycells, i(obs) j(neworder)
list
You can loop over the mycells
variables if you need to further split
them.
Upvotes: 1
Reputation: 37183
In general, please consider using dataex
(SSC) to create easy data examples.
You don't give all the (not trivial) code you used to split
the variables. As it happens, I don't think your variable names are easy to work with, so I re-created the split in my own fashion. If you reshape long
the split data, then sorting by date is easy, but I have pulled up short of the reverse reshape wide
, as I suspect the long structure is much easier to work with.
clear
input str80 data
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split data, p(;) gen(x)
local j = 1
gen work = ""
foreach x of var x* {
replace work = substr(`x', 1, strpos(`x', ":") - 1)
gen date`j' = daily(work, "DMY", 2050)
replace work = substr(`x', strpos(`x', ":") + 1, .)
split work, p(,)
rename (work1 work2 work3) (vA`j' vB`j' vC`j')
local ++j
}
drop work
drop x*
drop data
gen id = _n
edit
reshape long date vA vB vC, i(id) j(which)
drop if missing(date)
bysort id (date): replace which = _n
list, sepby(id)
+----------------------------------------+
| id which date vA vB vC |
|----------------------------------------|
1. | 1 1 12477 A27 B94 C30 |
2. | 1 2 14603 A11 B14 C89 |
|----------------------------------------|
3. | 2 1 18718 A25 B82 C65 |
|----------------------------------------|
4. | 3 1 15776 A11 B72 C68 |
5. | 3 2 18082 A21 B55 C26 |
6. | 3 3 18786 A62 B47 C82 |
|----------------------------------------|
7. | 4 1 14773 A77 B19 C73 |
8. | 4 2 19177 A99 B04 C54 |
|----------------------------------------|
9. | 5 1 20388 A22 B95 C08 |
+----------------------------------------+
Upvotes: 1