Reputation: 197
Given the following set of data:
transect <- c("B","N","C","D","H","J","E","L","I","I")
sampler <- c(rep("J",5),rep("W",5))
species <- c("ROB","HAW","HAW","ROB","PIG","HAW","PIG","PIG","HAW","HAW")
weight <- c(2.80,52.00,56.00,2.80,16.00,55.00,16.20,18.30,52.50,57.00)
wingspan <- c(13.9, 52.0, 57.0, 13.7, 11.0,52.5, 10.7, 11.1, 52.3, 55.1)
week <- c(1,2,3,4,5,6,7,8,9,9)
# Warning to R newbs: Really bad idea to use this code
ex <- as.data.frame(cbind(transect,sampler,species,weight,wingspan,week))
What I’m trying to achieve is to transpose the species and its associated information on weight and wingspan. For a better idea of the expected result please see below. My data set is about half a million lines long with approximately 200 different species so it will be a very large dataframe.
transect sampler week ROBweight HAWweight PIGweight ROBwingspan HAWwingspan PIGwingspan
1 B J 1 2.8 0.0 0.0 13.9 0.0 0.0
2 N J 2 0.0 52.0 0.0 0.0 52.0 0.0
3 C J 3 0.0 56.0 0.0 0.0 57.0 0.0
4 D J 4 2.8 0.0 0.0 13.7 0.0 0.0
5 H J 5 0.0 0.0 16.0 0.0 0.0 11.0
6 J W 6 0.0 55.0 0.0 0.0 52.5 0.0
7 E W 7 0.0 0.0 16.2 0.0 0.0 10.7
8 L W 8 0.0 0.0 18.3 0.0 0.0 11.1
9 I W 9 0.0 52.5 0.0 0.0 52.3 0.0
10 I W 9 0.0 57.0 0.0 0.0 55.1 0.0
Upvotes: 3
Views: 1527
Reputation: 193517
The main problem is that you don't currently have unique "id" variables, which will create problems for the usual suspects of reshape
and dcast
.
Here's a solution. I've used getanID
from my "splitstackshape" package, but it's pretty easy to create your own unique ID variable using many different methods.
library(splitstackshape)
library(reshape2)
idvars <- c("transect", "sampler", "week")
ex <- getanID(ex, id.vars=idvars)
From here, you have two options:
reshape
from base R:reshape(ex, direction = "wide",
idvar=c("transect", "sampler", "week", ".id"),
timevar="species")
melt
and dcast
from "reshape2"First, melt
your data into a "long" form.
exL <- melt(ex, id.vars=c(idvars, ".id", "species"))
Then, cast
your data into a wide form.
dcast(exL, transect + sampler + week + .id ~ species + variable)
# transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
# 1 B J 1 1 NA NA NA NA 2.8 13.9
# 2 C J 3 1 56.0 57.0 NA NA NA NA
# 3 D J 4 1 NA NA NA NA 2.8 13.7
# 4 E W 7 1 NA NA 16.2 10.7 NA NA
# 5 H J 5 1 NA NA 16.0 11.0 NA NA
# 6 I W 9 1 52.5 52.3 NA NA NA NA
# 7 I W 9 2 57.0 55.1 NA NA NA NA
# 8 J W 6 1 55.0 52.5 NA NA NA NA
# 9 L W 8 1 NA NA 18.3 11.1 NA NA
# 10 N J 2 1 52.0 52.0 NA NA NA NA
Alternatively (and perhaps preferably), you can use the "data.table" package (at least version 1.8.11) as follows:
library(data.table)
library(reshape2) ## Also required here
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(ex)
DT[, .id := sequence(.N), by = c("transect", "sampler", "week")]
DTL <- melt(DT, measure.vars=c("weight", "wingspan"))
dcast.data.table(DTL, transect + sampler + week + .id ~ species + variable)
# transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
# 1: B J 1 1 NA NA NA NA 2.8 13.9
# 2: C J 3 1 56.0 57.0 NA NA NA NA
# 3: D J 4 1 NA NA NA NA 2.8 13.7
# 4: E W 7 1 NA NA 16.2 10.7 NA NA
# 5: H J 5 1 NA NA 16.0 11.0 NA NA
# 6: I W 9 1 52.5 52.3 NA NA NA NA
# 7: I W 9 2 57.0 55.1 NA NA NA NA
# 8: J W 6 1 55.0 52.5 NA NA NA NA
# 9: L W 8 1 NA NA 18.3 11.1 NA NA
# 10: N J 2 1 52.0 52.0 NA NA NA NA
Add fill = 0
to either of the dcast
versions to replace NA
values with 0.
Upvotes: 3