Reputation: 41
I'm trying to re-arrange the data in R from something like:
Patient ID,Episode Number,Admission Date (A),Admission Date (H),Admission Time (A),Admission Time (H)
1,5,20/08/2011,21/08/2011,1200,1300
2,6,21/08/2011,22/08/2011,1300,1400
3,7,22/08/2011,23/08/2011,1400,1500
4,8,23/08/2011,24/08/2011,1500,1600
to something like:
Record Type,Patient ID,Episode Number,Admission Date,Admission Time
H,1,5,20/08/2011,1200
A,1,5,21/08/2011,1300
H,2,6,21/08/2011,1300
A,2,6,22/08/2011,1400
H,3,7,22/08/2011,1400
A,3,7,23/08/2011,1500
H,4,8,23/08/2011,1500
A,4,8,24/08/2011,1600
(I've used CSV format so it's easier to to use them as test data).
I tried the reshape() function and it kind of worked:
> reshape(foo, direction = "long", idvar = 1, varying = 3:dim(foo)[2],
> sep = "..", timevar = "dataset")
Patient.ID Episode.Number dataset Admission.Date Admission.Time
1.A. 1 5 A. 20/08/2011 1200
2.A. 2 6 A. 21/08/2011 1300
3.A. 3 7 A. 22/08/2011 1400
4.A. 4 8 A. 23/08/2011 1500
1.H. 1 5 H. 21/08/2011 1300
2.H. 2 6 H. 22/08/2011 1400
3.H. 3 7 H. 23/08/2011 1500
4.H. 4 8 H. 24/08/2011 1600
But it's not in the exact format I wanted (I want for each "Patient ID", the first row is "H" and second row is "A").
In addition, when I extend this to the read data (which has 250+ columns) it failed:
> reshape(realdata, direction = "long", idvar = 1, varying =
> 6:dim(foo)[2], sep = "..", timevar = "dataset")
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, :
'varying' arguments must be the same length
I think partly because the colnames look like:
> colnames(foo)
[1] "Unique.Key"
[2] "Campus.Code"
[3] "UR"
[4] "Terminal.digit"
[5] "Admission.date..A."
[6] "Admission.date..H."
[7] "Admission.time..A."
[8] "Admission.time..H."
.
.
.
[31] "Medicare.Number"
[32] "Payor"
[33] "Doctor.specialty"
[34] "Clinic"
.
.
.
[202] "Admission.Source..A."
[203] "Admission.Source..H."
i.e. there are "common columns" (without suffixes) between columns with suffixes (hope this makes sense).
Upvotes: 4
Views: 10575
Reputation: 193517
The suggestions to use melt
and cast
(now dcast
and family) from the "reshape" (now "reshape2") package won't get you to the form you are looking for with your data. In particular, you'll need to do some additional processing if your end goal is the "semi-long" format you describe.
There are two problems you raise in your question:
First is the ordering of the results. As @RichieCotton points out in his comment and @mac in his answer, a call to order()
is enough to solve that problem.
Second is the error:
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, :
'varying' arguments must be the same length
This is because, as you guessed, there are non-varying columns in your list of varying = 6:dim(foo)[2]
selection.
One easy way to fix this is to use grep
to identify which columns are varying and use that to specify your columns instead of using an (incorrect) catchall as you did. Here's a worked example:
set.seed(1)
foo <- data.frame(Unique.Key = 1:4, Campus.Code = LETTERS[1:4],
Admission.Date..A = 11:14, Admission.Date..H = 21:24,
Medicare.Number = letters[1:4], Payor = letters[1:4],
Admission.Source..A = rnorm(4),
Admission.Source..H = rnorm(4))
foo
# Unique.Key Campus.Code Admission.Date..A Admission.Date..H Medicare.Number
# 1 1 A 11 21 a
# 2 2 B 12 22 b
# 3 3 C 13 23 c
# 4 4 D 14 24 d
# Payor Admission.Source..A Admission.Source..H
# 1 a -0.6264538 0.3295078
# 2 b 0.1836433 -0.8204684
# 3 c -0.8356286 0.4874291
# 4 d 1.5952808 0.7383247
Figure out which columns are varying and use that as your varying
argument:
varyingCols <- grep("\\.\\.A$|\\.\\.H$", names(foo))
out <- reshape(foo, direction = "long", idvar = "Unique.Key",
varying = varyingCols, sep = "..")
out[order(out$Unique.Key, rev(out$time)), ]
# Unique.Key Campus.Code Medicare.Number Payor time Admission.Date Admission.Source
# 1.H 1 A a a H 21 0.3295078
# 1.A 1 A a a A 11 -0.6264538
# 2.H 2 B b b H 22 -0.8204684
# 2.A 2 B b b A 12 0.1836433
# 3.H 3 C c c H 23 0.4874291
# 3.A 3 C c c A 13 -0.8356286
# 4.H 4 D d d H 24 0.7383247
# 4.A 4 D d d A 14 1.5952808
If your data are small (not many columns), you can manually count the location of the varying
columns and specify the vector. As you've already noticed, any columns not specified in idvar
or varying
get recycled appropriately.
out <- reshape(foo, direction = "long", idvar = "Unique.Key",
varying = c(3, 4, 7, 8), sep = "..")
Upvotes: 2
Reputation: 3554
You might be able to get what you're after by using melt and cast or reshape, but you're looking for something pretty specific, so it might be simpler to do the reshaping directly. You can subset the original data into two separate data frames, (one for A, one for H) then glue them back together.
The code below works on your sample data, but i also tried to write it flexibly enough so that it will hopefully work on your larger data set, as long as the columns are named consistently with the ..A. and ..H. suffixes.
#grab the common columns and the "A" columns
#(by using grepl to find any column that doesn't end in ".H.")
foo.a <- foo[,!grepl(x=colnames(foo),pattern = "\\.H\\.$")]
#strip the "..A." from the end of the ".A." column names
colnames(foo.a) <- sub(x=colnames(foo.a),
pattern="(.*)\\.\\.A\\.$",
rep = "\\1")
foo.a$Record.Type <- "A"
#grab the common columns and the "H" columns
#(by using grepl to find any column that doesn't end in ".A.")
foo.h <- foo[,!grepl(x=colnames(foo),pattern = "\\.A\\.$")]
#strip the "..H." from the end of the "..H." column names
colnames(foo.h) <- sub(x=colnames(foo.h),
pattern="(.*)\\.\\.H\\.$",
rep = "\\1")
foo.h$Record.Type <- "H"
#stick them back together
new.foo <- rbind(foo.a,foo.h)
#order by Patient.ID
new.foo <- new.foo[with(new.foo,order(Patient.ID)),]
#re-order the columns as you like
new.foo <- new.foo[,c(1,2,5,3,4)]
This gives me:
> new.foo
Patient.ID Episode.Number Record.Type Admission.Date Admission.Time
1 1 5 A 20/08/2011 1200
5 1 5 H 21/08/2011 1300
2 2 6 A 21/08/2011 1300
6 2 6 H 22/08/2011 1400
3 3 7 A 22/08/2011 1400
7 3 7 H 23/08/2011 1500
4 4 8 A 23/08/2011 1500
8 4 8 H 24/08/2011 1600
Upvotes: 0