Alby
Alby

Reputation: 5742

how do you get the similar result computed with dlply using dplyr?

The data look like following:

> tmp
         gene         go
1      44M2.3 GO:0000166
2      44M2.3 GO:0003723
3      44M2.3 GO:0004527
4      44M2.3 GO:0005730
5      44M2.3 GO:0070062
6      44M2.3 GO:0090305
7      44M2.3 GO:0090305
8      44M2.3 GO:0090305
9  A0A087WUJ7 GO:0004553
10 A0A087WUJ7 GO:0005975

>dput(tmp)
structure(list(gene = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L), .Label = c("44M2.3", "A0A087WUJ7"), class = "factor"), 
    go = structure(c(1L, 2L, 3L, 5L, 7L, 8L, 8L, 8L, 4L, 6L), .Label = c("GO:0000166", 
    "GO:0003723", "GO:0004527", "GO:0004553", "GO:0005730", "GO:0005975", 
    "GO:0070062", "GO:0090305"), class = "factor")), .Names = c("gene", 
"go"), row.names = c(NA, -10L), class = "data.frame")

With plyr package, I can obtain the list of genes and its corresponding go terms like following:

> dlply(tmp, .(gene),function(x) {x[["go"]]})
$`44M2.3`
[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

$A0A087WUJ7
[1] GO:0004553 GO:0005975
Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

But how can you achieve the similar behavior with dplyr?

Upvotes: 2

Views: 87

Answers (1)

Steven Beaupré
Steven Beaupré

Reputation: 21631

As mentionned in the comments, a base R approach would be:

split(tmp$go, f = tmp$gene)

Which gives:

#$`44M2.3`
#[1] GO:0000166 GO:0003723 GO:0004527 GO:0005730 GO:0070062 GO:0090305 GO:0090305 GO:0090305
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

#$A0A087WUJ7
#[1] GO:0004553 GO:0005975
#Levels: GO:0000166 GO:0003723 GO:0004527 GO:0004553 GO:0005730 GO:0005975 GO:0070062 GO:0090305

Upvotes: 1

Related Questions