user18081990
user18081990

Reputation: 53

Reformat a data frame in R

I have a data frame with the following format:

Species  Annotation  Gene          Group  Mean_expression
ARIRE    TAR2_ARATH  Tr_200_G1_i1  1      8.408
CYLIM    TAR2_ARATH  Tr_11_G1_i1   1      10.39
ECHPL    TAR2_ARATH  Tr_222_G1_i1  1      9.32
FERPI    TAR2_ARATH  Tr_600_G1_i3  1      11.21
ARIRE    BRL2_ORYSH  Tr_80_G1_i9   2      180.33
CYLIM    BRL2_ORYSH  Tr_320_G1_i1  2      200.227
CYLIM    BRL2_ORYSH  Tr_320_G1_i2  2      150.343
ECHPL    BRL2_ARATH  TR_111_G1_i5  2      100.209    

I would like to have a data frame with the species as the variables of columns.

           ARIRE   CYLIM   ECHPL   FERPL 
  Group1   8.4     10.39   9.32    11.21 
  Group2   180.33  200.227 100.209 NA
  Group2   NA      150.343 NA      NA

Do you have any idea of what's the best way of doing this. I've already transformed the data frame into a list and tried split and reshape but with no good results.

Any help will be appreciated

Upvotes: 2

Views: 736

Answers (1)

mt1022
mt1022

Reputation: 17289

here is a data.table solution:

library(data.table)

dtt[, g := seq_len(.N), by = .(Species, Annotation)]
res <- dcast(dtt, Group + g ~ Species, value.var = 'Mean_expression')
res[, g := NULL]
res
# > res
#    Group   ARIRE   CYLIM   ECHPL FERPI
# 1:     1   8.408  10.390   9.320 11.21
# 2:     2 180.330 200.227 100.209    NA
# 3:     2      NA 150.343      NA    NA

the data:

dtt <- read.table(textConnection('Species  Annotation  Gene          Group  Mean_expression
ARIRE    TAR2_ARATH  Tr_200_G1_i1  1      8.408
CYLIM    TAR2_ARATH  Tr_11_G1_i1   1      10.39
ECHPL    TAR2_ARATH  Tr_222_G1_i1  1      9.32
FERPI    TAR2_ARATH  Tr_600_G1_i3  1      11.21
ARIRE    BRL2_ORYSH  Tr_80_G1_i9   2      180.33
CYLIM    BRL2_ORYSH  Tr_320_G1_i1  2      200.227
CYLIM    BRL2_ORYSH  Tr_320_G1_i2  2      150.343
ECHPL    BRL2_ARATH  TR_111_G1_i5  2      100.209'), header = TRUE)

setDT(dtt)

Edit

With rowid from data.table, the above solution could be further simplified:

dcast(
    dtt,
    Group + rowid(Species, Annotation) ~ Species,
    value.var = 'Mean_expression')[, Species := NULL]

Upvotes: 3

Related Questions