GodinA
GodinA

Reputation: 1093

Reshaping data in R without using dcast (reshape2)

my dcast Rcodes are not running anymore. I have the problem discussed here: segfault in R using reshape2 package and dcast

The bug has not yet been fixed so I am looking for other ways of achieving my dcast output. Any suggestions would be greatly appreciated!

Below a very small dput of my dataset. Basically, there's one entry per species per survey ID ("EID"). I would like to get one entry per survey ID ("EID") with all my species as columns with their associated value ("value") i.e., wide format.

> dput(sample)
structure(list(EID = c("L00155/69/2000-09-06", "Q99107/178/1999-08-23", 
"G02192/1/2002-07-08", "G97158/1/1997-10-26", "Q06091/2/2006-07-04", 
"L00004/171/2000-03-01", "G11094/15/2011-09-05", "Q04127/16/2004-07-28", 
"Q02122/230/2002-10-29", "G08002/6/2008-02-03", "Q99006/143/1999-02-17", 
"Q08053/3/2008-06-12", "Q99128/22/1999-08-19", "L00177/83/2000-12-18", 
"Q05122/11/2005-08-30", "Q04156/44/2004-10-29", "L01097/69/2001-06-26", 
"G08004/169/2008-05-14", "Q03041/26/2003-06-14", "G98115/60/1998-09-11", 
"G00002/20/2000-01-17", "G00002/20/2000-01-17", "G00054/1/2000-05-31", 
"G00054/1/2000-05-31"), tspp.name = structure(c(13L, 13L, 13L, 
13L, 16L, 13L, 13L, 4L, 13L, 13L, 13L, 13L, 13L, 11L, 4L, 13L, 
13L, 13L, 13L, 20L, 13L, 13L, 24L, 24L), .Label = c("American plaice", 
"American sand lance", "Arctic cod", "Atlantic cod", "Atlantic halibut", 
"Atlantic herring", "Bigeye tuna", "Black dogfish", "Bluefin tuna", 
"Capelin", "Greenland halibut", "Lookdown", "Northern shrimp", 
"Ocean quahog", "Porbeagle", "Redfishes", "Slenteye headlightfish", 
"Smooth flounder", "Spiny dogfish", "Striped pink shrimp", "Summer flounder", 
"White hake", "Winter flounder", "Witch flounder", "Yellowtail flounder"
), class = "factor"), elasmo.name = structure(c(26L, 30L, 30L, 
30L, 30L, 25L, 21L, 30L, 30L, 30L, 30L, 21L, 30L, 5L, 30L, 30L, 
30L, 21L, 30L, 30L, 14L, 21L, 24L, 21L), .Label = c("Arctic skate", 
"Atlantic sharpnose shark", "Barndoor skate", "Basking shark", 
"Black dogfish", "Blue shark", "Deepsea cat shark", "Greenland shark", 
"Jensen's skate", "Little skate", "Manta", "Ocean quahog", "Oceanic whitetip shark", 
"Porbeagle", "Portuguese shark", "Rough sagre", "Roughtail stingray", 
"Round skate", "Sharks", "Shortfin mako", "Skates", "Smooth skate", 
"Soft skate", "Spiny dogfish", "Spinytail skate", "Thorny skate", 
"White shark", "White skate", "Winter skate", "NA"), class = "factor"), 
    elasmo.discard = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
    25, 0, 0, 0, 1, 0, 0, 1, 1, 15, 25)), .Names = c("EID", "tspp.name", 
"elasmo.name", "elasmo.discard"), class = "data.frame", row.names = c("18496", 
"488791", "87549", "236671", "139268", "15606", "11132", "115531", 
"93441", "159675", "403751", "42587", "485941", "19285", "130395", 
"119974", "73826", "7953", "99124", "351461", "71", "72", "184", 
"185"))

At the end, I wish to obtain this:

library(plyr)
test<-dcast(sample, ...~elasmo.name,value.var ="elasmo.discard",fun.aggregate=sum)
test

Note that the "dcast" code works here, but I do get a fatal error when I run it on my overall dataset which has 145349 rows.

Many thanks!!

Upvotes: 0

Views: 500

Answers (2)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37754

This would be the pre-Hadley method; first aggregate to get the sums, then reshape.

foo <- aggregate(d[,4,drop=FALSE], by=d[,1:3], sum)
reshape(foo, v.names="elasmo.discard", idvar=c("EID", "tspp.name"), 
             timevar="elasmo.name", direction="wide")

If the first part is slow, it may help to have fewer columns in the "by" part; it looks like tspp.name is defined by EID, if so, don't aggregate by it but instead add it in after the fact.

If the second part is slow, perhaps try one of the methods here: https://stackoverflow.com/a/9617424/210673.

To get better help on speeding it up, provide an appropriate example (perhaps using sample or rep) that code can be tested on. Solution speed often depends on how many unique combinations of each variable there are.

Upvotes: 1

djhurio
djhurio

Reputation: 5536

I am not able to reproduce the error. See the code attached. I have increased the row number of sample to 196608.

Probably the number of categories in sample$elasmo.name plays a role.

library(reshape2)

sample <- structure(list(EID = c("L00155/69/2000-09-06", "Q99107/178/1999-08-23", 
  "G02192/1/2002-07-08", "G97158/1/1997-10-26", "Q06091/2/2006-07-04", 
  "L00004/171/2000-03-01", "G11094/15/2011-09-05", "Q04127/16/2004-07-28", 
  "Q02122/230/2002-10-29", "G08002/6/2008-02-03", "Q99006/143/1999-02-17", 
  "Q08053/3/2008-06-12", "Q99128/22/1999-08-19", "L00177/83/2000-12-18", 
  "Q05122/11/2005-08-30", "Q04156/44/2004-10-29", "L01097/69/2001-06-26", 
  "G08004/169/2008-05-14", "Q03041/26/2003-06-14", "G98115/60/1998-09-11", 
  "G00002/20/2000-01-17", "G00002/20/2000-01-17", "G00054/1/2000-05-31", 
  "G00054/1/2000-05-31"), tspp.name = structure(c(13L, 13L, 13L, 
  13L, 16L, 13L, 13L, 4L, 13L, 13L, 13L, 13L, 13L, 11L, 4L, 13L, 
  13L, 13L, 13L, 20L, 13L, 13L, 24L, 24L), .Label = c("American plaice", 
  "American sand lance", "Arctic cod", "Atlantic cod", "Atlantic halibut", 
  "Atlantic herring", "Bigeye tuna", "Black dogfish", "Bluefin tuna", 
  "Capelin", "Greenland halibut", "Lookdown", "Northern shrimp", 
  "Ocean quahog", "Porbeagle", "Redfishes", "Slenteye headlightfish", 
  "Smooth flounder", "Spiny dogfish", "Striped pink shrimp", "Summer flounder", 
  "White hake", "Winter flounder", "Witch flounder", "Yellowtail flounder"
  ), class = "factor"), elasmo.name = structure(c(26L, 30L, 30L, 
  30L, 30L, 25L, 21L, 30L, 30L, 30L, 30L, 21L, 30L, 5L, 30L, 30L, 
  30L, 21L, 30L, 30L, 14L, 21L, 24L, 21L), .Label = c("Arctic skate", 
  "Atlantic sharpnose shark", "Barndoor skate", "Basking shark", 
  "Black dogfish", "Blue shark", "Deepsea cat shark", "Greenland shark", 
  "Jensen's skate", "Little skate", "Manta", "Ocean quahog", "Oceanic whitetip shark", 
  "Porbeagle", "Portuguese shark", "Rough sagre", "Roughtail stingray", 
  "Round skate", "Sharks", "Shortfin mako", "Skates", "Smooth skate", 
  "Soft skate", "Spiny dogfish", "Spinytail skate", "Thorny skate", 
  "White shark", "White skate", "Winter skate", "NA"), class = "factor"), 
      elasmo.discard = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
      25, 0, 0, 0, 1, 0, 0, 1, 1, 15, 25)), .Names = c("EID", "tspp.name", 
  "elasmo.name", "elasmo.discard"), class = "data.frame", row.names = c("18496", 
  "488791", "87549", "236671", "139268", "15606", "11132", "115531", 
  "93441", "159675", "403751", "42587", "485941", "19285", "130395", 
  "119974", "73826", "7953", "99124", "351461", "71", "72", "184", 
  "185"))

n <- nrow(sample)
N <- 145349
p <- ceiling(log2(N / n))
n * 2^p
n * 2^p > N

# Bad way of increasing the row number
for (i in 1:p) sample <- rbind(sample, sample)

nrow(sample)

class(sample)
head(sample)

table(sample$elasmo.name)
table(as.character(sample$elasmo.name))

test <- dcast(sample, ... ~ elasmo.name,
              value.var = "elasmo.discard",
              fun.aggregate = sum)
head(test)

Upvotes: 0

Related Questions