Reputation: 333
I have wide longitudinal data that I would like to reshape into long data. This is a sample:
sex group id sex.1 group.1 status1 beg1 end1 status2 beg2 end2
1 1000 1 a 1000 1 a Vocational <NA> S2007 HE S2007 S2008
2 1001 1 a 1001 1 a Vocational <NA> S2007 HE S2008 S2012
3 1004 1 a 1004 1 a Vocational <NA> S2008 999 <NA> <NA>
4 1006 2 a 1006 2 a Vocational <NA> S2007 Army S2012 <NA>
5 1007 1 a 1007 1 a HE <NA> S2007 999 <NA> <NA>
6 1008 1 a 1008 1 a Vocational S2013 <NA> 999 <NA> <NA>
I need to get it in this shape, compatible with SPELL format:
id sex group index status beg end
1000 1 a 1 Vocational NA S2007
1000 1 a 2 HE S2008 S2012
...
I am using the following command:
spell <- reshape(data,
varying=names(data)[4:60],
direction="long",
idvar=c("id","sex","group"),
sep="")
And I get the following error message:
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L], :
duplicate 'row.names' are not allowed
In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’
I have tried setting NA values to 999 this way, but it does not work.
data[is.na(data)] <- 999
Do you know what may get this to work? thanks a lot beforehand!
Upvotes: 4
Views: 5443
Reputation: 1510
You may be able to resolve the "duplicate 'row.names' are not allowed" error message by specifying new row names when you reshape by using reshape
's new.row.names
option:
spell <- reshape(data,
varying = names(data)[4:60],
direction = "long",
idvar = c("id","sex","group"),
sep = "",
new.row.names = 1:1000)
Upvotes: 0
Reputation: 18868
That error message indicates that you either have duplicate rows or missing values in the id variable(s).
Check for duplicates first:
with(data, any(duplicated(cbind(id, sex, group))))
If TRUE, then there's your answer.
If FALSE, then you may have missing values in the id variable(s), maybe even whole missing rows, and probably at the end. This can be due to the actual source data having blank rows or your R command to import the data, for example using read_excel and specifying too many rows in the range argument. In any case, check the data carefully for missing values in the id variable(s). Replacing them all with 999 won't help.
Upvotes: 4
Reputation: 15458
x2 <- reshape(mydata, idvar=c("id.1", "sex.1", "group.1"), direction="long",
varying=list(c(7, 10), c(8, 11), c(9, 12)),
v.names=c("status","beg","end"))
head(x2)
id sex group id.1 sex.1 group.1 time status beg end
1000.1.a.1 1000 1 a 1000 1 a 1 Vocational <NA> S2007
1001.1.a.1 1001 1 a 1001 1 a 1 Vocational <NA> S2007
1004.1.a.1 1004 1 a 1004 1 a 1 Vocational <NA> S2008
1006.2.a.1 1006 2 a 1006 2 a 1 Vocational <NA> S2007
1007.1.a.1 1007 1 a 1007 1 a 1 HE <NA> S2007
1008.1.a.1 1008 1 a 1008 1 a 1 Vocational S2013 <NA>
Upvotes: 1
Reputation: 887301
Assuming that "id.1", "sex.1", and "group.1" are duplicated columns, we could remove those columns, change the column names by inserting a delimiter ("_"), and reshape
data1 <- data[-(4:6)]
nm1 <- sub('\\d+$', '', names(data1)[-(1:3)])
names(data1)[-(1:3)] <- paste(nm1, ave(nm1, nm1, FUN=seq_along), sep="_")
res <- reshape(data1, varying=4:ncol(data1), direction='long',
idvar=c('id', 'sex', 'group'), sep="_")
row.names(res) <- NULL
head(res)
# id sex group time status beg end
# 1 1000 1 a 1 Vocational <NA> S2007
# 2 1001 1 a 1 Vocational <NA> S2007
# 3 1004 1 a 1 Vocational <NA> S2008
# 4 1006 2 a 1 Vocational <NA> S2007
# 5 1007 1 a 1 HE <NA> S2007
# 6 1008 1 a 1 Vocational S2013 <NA>
data <- structure(list(id = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex = c(1L, 1L, 1L, 2L, 1L, 1L), group = c("a", "a", "a",
"a", "a", "a"), id.1 = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex.1 = c(1L, 1L, 1L, 2L, 1L, 1L), group.1 = c("a", "a", "a",
"a", "a", "a"), status1 = c("Vocational", "Vocational", "Vocational",
"Vocational", "HE", "Vocational"), beg1 = c("<NA>", "<NA>", "<NA>",
"<NA>", "<NA>", "S2013"), end1 = c("S2007", "S2007", "S2008",
"S2007", "S2007", "<NA>"), status2 = c("HE", "HE", "999", "Army",
"999", "999"), beg2 = c("S2007", "S2008", "<NA>", "S2012", "<NA>",
"<NA>"), end2 = c("S2008", "S2012", "<NA>", "<NA>", "<NA>", "<NA>"
)), .Names = c("id", "sex", "group", "id.1", "sex.1", "group.1",
"status1", "beg1", "end1", "status2", "beg2", "end2"), class =
"data.frame", row.names = c(NA, -6L))
Upvotes: 1