Reputation: 53
I have a data frame with the following format:
Species Annotation Gene Group Mean_expression
ARIRE TAR2_ARATH Tr_200_G1_i1 1 8.408
CYLIM TAR2_ARATH Tr_11_G1_i1 1 10.39
ECHPL TAR2_ARATH Tr_222_G1_i1 1 9.32
FERPI TAR2_ARATH Tr_600_G1_i3 1 11.21
ARIRE BRL2_ORYSH Tr_80_G1_i9 2 180.33
CYLIM BRL2_ORYSH Tr_320_G1_i1 2 200.227
CYLIM BRL2_ORYSH Tr_320_G1_i2 2 150.343
ECHPL BRL2_ARATH TR_111_G1_i5 2 100.209
I would like to have a data frame with the species as the variables of columns.
ARIRE CYLIM ECHPL FERPL
Group1 8.4 10.39 9.32 11.21
Group2 180.33 200.227 100.209 NA
Group2 NA 150.343 NA NA
Do you have any idea of what's the best way of doing this. I've already transformed the data frame into a list and tried split
and reshape
but with no good results.
Any help will be appreciated
Upvotes: 2
Views: 736
Reputation: 17289
here is a data.table
solution:
library(data.table)
dtt[, g := seq_len(.N), by = .(Species, Annotation)]
res <- dcast(dtt, Group + g ~ Species, value.var = 'Mean_expression')
res[, g := NULL]
res
# > res
# Group ARIRE CYLIM ECHPL FERPI
# 1: 1 8.408 10.390 9.320 11.21
# 2: 2 180.330 200.227 100.209 NA
# 3: 2 NA 150.343 NA NA
the data:
dtt <- read.table(textConnection('Species Annotation Gene Group Mean_expression
ARIRE TAR2_ARATH Tr_200_G1_i1 1 8.408
CYLIM TAR2_ARATH Tr_11_G1_i1 1 10.39
ECHPL TAR2_ARATH Tr_222_G1_i1 1 9.32
FERPI TAR2_ARATH Tr_600_G1_i3 1 11.21
ARIRE BRL2_ORYSH Tr_80_G1_i9 2 180.33
CYLIM BRL2_ORYSH Tr_320_G1_i1 2 200.227
CYLIM BRL2_ORYSH Tr_320_G1_i2 2 150.343
ECHPL BRL2_ARATH TR_111_G1_i5 2 100.209'), header = TRUE)
setDT(dtt)
With rowid
from data.table
, the above solution could be further simplified:
dcast(
dtt,
Group + rowid(Species, Annotation) ~ Species,
value.var = 'Mean_expression')[, Species := NULL]
Upvotes: 3