PatraoPedro
PatraoPedro

Reputation: 197

Partially transpose a dataframe in R

Given the following set of data:

 transect <- c("B","N","C","D","H","J","E","L","I","I")
 sampler <- c(rep("J",5),rep("W",5))
 species <- c("ROB","HAW","HAW","ROB","PIG","HAW","PIG","PIG","HAW","HAW")
 weight <- c(2.80,52.00,56.00,2.80,16.00,55.00,16.20,18.30,52.50,57.00)
 wingspan <- c(13.9, 52.0, 57.0, 13.7, 11.0,52.5, 10.7, 11.1, 52.3, 55.1)
 week <- c(1,2,3,4,5,6,7,8,9,9)
 # Warning to R newbs: Really bad idea to use this code
 ex <- as.data.frame(cbind(transect,sampler,species,weight,wingspan,week))

What I’m trying to achieve is to transpose the species and its associated information on weight and wingspan. For a better idea of the expected result please see below. My data set is about half a million lines long with approximately 200 different species so it will be a very large dataframe.

      transect sampler week ROBweight HAWweight PIGweight ROBwingspan HAWwingspan PIGwingspan
1         B       J    1       2.8       0.0       0.0        13.9         0.0         0.0
2         N       J    2       0.0      52.0       0.0         0.0        52.0         0.0
3         C       J    3       0.0      56.0       0.0         0.0        57.0         0.0
4         D       J    4       2.8       0.0       0.0        13.7         0.0         0.0
5         H       J    5       0.0       0.0      16.0         0.0         0.0        11.0
6         J       W    6       0.0      55.0       0.0         0.0        52.5         0.0
7         E       W    7       0.0       0.0      16.2         0.0         0.0        10.7
8         L       W    8       0.0       0.0      18.3         0.0         0.0        11.1
9         I       W    9       0.0      52.5       0.0         0.0        52.3         0.0
10        I       W    9       0.0      57.0       0.0         0.0        55.1         0.0

Upvotes: 3

Views: 1527

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

The main problem is that you don't currently have unique "id" variables, which will create problems for the usual suspects of reshape and dcast.

Here's a solution. I've used getanID from my "splitstackshape" package, but it's pretty easy to create your own unique ID variable using many different methods.

library(splitstackshape)
library(reshape2)
idvars <- c("transect", "sampler", "week")
ex <- getanID(ex, id.vars=idvars)

From here, you have two options:

reshape from base R:

reshape(ex, direction = "wide", 
        idvar=c("transect", "sampler", "week", ".id"), 
        timevar="species")

melt and dcast from "reshape2"

First, melt your data into a "long" form.

exL <- melt(ex, id.vars=c(idvars, ".id", "species"))

Then, cast your data into a wide form.

dcast(exL, transect + sampler + week + .id ~ species + variable)
#    transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
# 1         B       J    1   1         NA           NA         NA           NA        2.8         13.9
# 2         C       J    3   1       56.0         57.0         NA           NA         NA           NA
# 3         D       J    4   1         NA           NA         NA           NA        2.8         13.7
# 4         E       W    7   1         NA           NA       16.2         10.7         NA           NA
# 5         H       J    5   1         NA           NA       16.0         11.0         NA           NA
# 6         I       W    9   1       52.5         52.3         NA           NA         NA           NA
# 7         I       W    9   2       57.0         55.1         NA           NA         NA           NA
# 8         J       W    6   1       55.0         52.5         NA           NA         NA           NA
# 9         L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10        N       J    2   1       52.0         52.0         NA           NA         NA           NA

A better option: "data.table"

Alternatively (and perhaps preferably), you can use the "data.table" package (at least version 1.8.11) as follows:

library(data.table)
library(reshape2) ## Also required here
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(ex)
DT[, .id := sequence(.N), by = c("transect", "sampler", "week")]
DTL <- melt(DT, measure.vars=c("weight", "wingspan"))
dcast.data.table(DTL, transect + sampler + week + .id ~ species + variable)
#     transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
#  1:        B       J    1   1         NA           NA         NA           NA        2.8         13.9
#  2:        C       J    3   1       56.0         57.0         NA           NA         NA           NA
#  3:        D       J    4   1         NA           NA         NA           NA        2.8         13.7
#  4:        E       W    7   1         NA           NA       16.2         10.7         NA           NA
#  5:        H       J    5   1         NA           NA       16.0         11.0         NA           NA
#  6:        I       W    9   1       52.5         52.3         NA           NA         NA           NA
#  7:        I       W    9   2       57.0         55.1         NA           NA         NA           NA
#  8:        J       W    6   1       55.0         52.5         NA           NA         NA           NA
#  9:        L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10:        N       J    2   1       52.0         52.0         NA           NA         NA           NA

Add fill = 0 to either of the dcast versions to replace NA values with 0.

Upvotes: 3

Related Questions