Gina Zetkin
Gina Zetkin

Reputation: 333

Error in reshape(): duplicate 'row names' are not allowed

I have wide longitudinal data that I would like to reshape into long data. This is a sample:

sex group id sex.1 group.1    status1  beg1  end1 status2  beg2  end2
1 1000   1     a 1000     1       a Vocational  <NA> S2007      HE S2007 S2008
2 1001   1     a 1001     1       a Vocational  <NA> S2007      HE S2008 S2012
3 1004   1     a 1004     1       a Vocational  <NA> S2008     999  <NA>  <NA>
4 1006   2     a 1006     2       a Vocational  <NA> S2007    Army S2012  <NA>
5 1007   1     a 1007     1       a         HE  <NA> S2007     999  <NA>  <NA>
6 1008   1     a 1008     1       a Vocational S2013  <NA>     999  <NA>  <NA>

I need to get it in this shape, compatible with SPELL format:

  id sex  group index  status    beg     end
1000  1    a      1   Vocational  NA     S2007
1000  1    a      2      HE      S2008   S2012
...

I am using the following command:

spell <- reshape(data, 
                 varying=names(data)[4:60],
                 direction="long",
                 idvar=c("id","sex","group"),
                 sep="")   

And I get the following error message:

    Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L],  : 
duplicate 'row.names' are not allowed
        In addition: Warning message: non-unique value when setting 'row.names': ‘NA.1’ 

I have tried setting NA values to 999 this way, but it does not work.

data[is.na(data)] <- 999

Do you know what may get this to work? thanks a lot beforehand!

Upvotes: 4

Views: 5443

Answers (4)

coip
coip

Reputation: 1510

You may be able to resolve the "duplicate 'row.names' are not allowed" error message by specifying new row names when you reshape by using reshape's new.row.names option:

spell <- reshape(data, 
                 varying = names(data)[4:60],
                 direction = "long",
                 idvar = c("id","sex","group"),
                 sep = "",
                 new.row.names = 1:1000)   

Upvotes: 0

Edward
Edward

Reputation: 18868

That error message indicates that you either have duplicate rows or missing values in the id variable(s).

Check for duplicates first:

with(data, any(duplicated(cbind(id, sex, group))))

If TRUE, then there's your answer.

If FALSE, then you may have missing values in the id variable(s), maybe even whole missing rows, and probably at the end. This can be due to the actual source data having blank rows or your R command to import the data, for example using read_excel and specifying too many rows in the range argument. In any case, check the data carefully for missing values in the id variable(s). Replacing them all with 999 won't help.

Upvotes: 4

Metrics
Metrics

Reputation: 15458

x2 <- reshape(mydata, idvar=c("id.1", "sex.1", "group.1"), direction="long", 
              varying=list(c(7, 10), c(8, 11), c(9, 12)), 
              v.names=c("status","beg","end"))

head(x2)

             id sex group id.1 sex.1 group.1 time     status   beg   end
1000.1.a.1 1000   1     a 1000     1       a    1 Vocational  <NA> S2007
1001.1.a.1 1001   1     a 1001     1       a    1 Vocational  <NA> S2007
1004.1.a.1 1004   1     a 1004     1       a    1 Vocational  <NA> S2008
1006.2.a.1 1006   2     a 1006     2       a    1 Vocational  <NA> S2007
1007.1.a.1 1007   1     a 1007     1       a    1         HE  <NA> S2007
1008.1.a.1 1008   1     a 1008     1       a    1 Vocational S2013  <NA>

Upvotes: 1

akrun
akrun

Reputation: 887301

Assuming that "id.1", "sex.1", and "group.1" are duplicated columns, we could remove those columns, change the column names by inserting a delimiter ("_"), and reshape

data1 <- data[-(4:6)]
nm1 <- sub('\\d+$', '', names(data1)[-(1:3)])
names(data1)[-(1:3)] <- paste(nm1, ave(nm1, nm1, FUN=seq_along), sep="_")
res <- reshape(data1, varying=4:ncol(data1), direction='long',
             idvar=c('id', 'sex', 'group'), sep="_")
row.names(res) <- NULL
head(res)
#     id sex group time     status  beg   end
# 1 1000   1     a    1 Vocational <NA> S2007
# 2 1001   1     a    1 Vocational <NA> S2007
# 3 1004   1     a    1 Vocational <NA> S2008
# 4 1006   2     a    1 Vocational <NA> S2007
# 5 1007   1     a    1         HE <NA> S2007
# 6 1008   1     a    1 Vocational S2013  <NA>

data

data <- structure(list(id = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
 ), sex = c(1L, 1L, 1L, 2L, 1L, 1L), group = c("a", "a", "a", 
"a", "a", "a"), id.1 = c(1000L, 1001L, 1004L, 1006L, 1007L, 1008L
), sex.1 = c(1L, 1L, 1L, 2L, 1L, 1L), group.1 = c("a", "a", "a", 
"a", "a", "a"), status1 = c("Vocational", "Vocational", "Vocational", 
"Vocational", "HE", "Vocational"), beg1 = c("<NA>", "<NA>", "<NA>", 
"<NA>", "<NA>", "S2013"), end1 = c("S2007", "S2007", "S2008", 
"S2007", "S2007", "<NA>"), status2 = c("HE", "HE", "999", "Army", 
"999", "999"), beg2 = c("S2007", "S2008", "<NA>", "S2012", "<NA>", 
"<NA>"), end2 = c("S2008", "S2012", "<NA>", "<NA>", "<NA>", "<NA>"
)), .Names = c("id", "sex", "group", "id.1", "sex.1", "group.1", 
"status1", "beg1", "end1", "status2", "beg2", "end2"), class = 
 "data.frame", row.names = c(NA, -6L))

Upvotes: 1

Related Questions