Louis Maddox
Louis Maddox

Reputation: 5576

How to retain grouping of values in 'long' data after reshape2::melt?

I'm using the R package reshape's melt function, and producing a dual-bar chart (side by side) to present values for two distinct types of genetic conservation for a few dozen species.

I can order this list while "wide", e.g. arrange(species.table, desc(miR), species):

...            species       miR      snoR
1                  Cow 1.0000000 0.9925373
2                Sheep 1.0000000 0.9925373
3                  Cat 0.9967914 1.0000000
4                  Dog 0.9967914 1.0000000
5                Panda 0.9967914 1.0000000
6     White_rhinoceros 0.9967914 1.0000000
7               Alpaca 0.9775401 0.9626866
8           Guinea_Pig 0.9775401 0.9776119
9                 Pika 0.9775401 0.9626866
10                 Rat 0.9775401 0.9776119
11               Mouse 0.9358289 0.9701493
12               Horse 0.9294118 0.9726368
13                 Pig 0.9294118 0.9726368
14     Chinese_Hamster 0.9155080 0.9527363
...

But the wide data comes out with the two conservation types on different lines, separating the species. How can I get the species 'paired' in the list, rather than e.g.:

...            species variable     value
1                  Cat     snoR 1.0000000
2                  Cow      miR 1.0000000
3                  Dog     snoR 1.0000000
4                Panda     snoR 1.0000000
5                Sheep      miR 1.0000000
6     White_rhinoceros     snoR 1.0000000
7                  Cat      miR 0.9967914
8                  Dog      miR 0.9967914
9                Panda      miR 0.9967914
10    White_rhinoceros      miR 0.9967914
11                 Cow     snoR 0.9925373
12               Sheep     snoR 0.9925373
13            Elephant     snoR 0.9875622
14              Rabbit     snoR 0.9875622
15               Shrew     snoR 0.9875622
16              Tenrec     snoR 0.9875622
17          Guinea_Pig     snoR 0.9776119
18                 Rat     snoR 0.9776119
...

My intuition is that... I would have to melt the data row by row to achieve this and concatenate the resulting row pairs with rbind (or some more efficient non-base R equivalent). Is there a more legitimate built-in way to do that? i.e. to make the melted data aware that I want a species-by-species list and keep the same species adjacent?

e.g. something more like:

...            species variable     value
1                  Cow      miR 1.0000000
2                  Cow     snoR 0.9925373
3                  Dog     snoR 1.0000000
4                  Dog      miR 0.9967914
5                Panda     snoR 1.0000000
6                Panda      miR 0.9967914
7                Sheep      miR 1.0000000
8                Sheep     snoR 0.9925373
9     White_rhinoceros      miR 0.9967914
10    White_rhinoceros     snoR 1.0000000
...

Upvotes: 0

Views: 40

Answers (1)

jeremycg
jeremycg

Reputation: 24955

Starting from your wide data, I think you want to sort by the sum of the two expression values for each species:

library(dplyr)
library(tidyr)
dat %>% mutate(new = miR + snoR) %>%
        gather(type, expression, -species, -new) %>%
        arrange(desc(new), species, type) %>%
        select(-new)

            species type expression
1               Cat  miR  0.9967914
2               Cat snoR  1.0000000
3               Dog  miR  0.9967914
4               Dog snoR  1.0000000
5             Panda  miR  0.9967914
6             Panda snoR  1.0000000
7  White_rhinoceros  miR  0.9967914
8  White_rhinoceros snoR  1.0000000
9               Cow  miR  1.0000000
10              Cow snoR  0.9925373
11            Sheep  miR  1.0000000
12            Sheep snoR  0.9925373
13       Guinea_Pig  miR  0.9775401
14       Guinea_Pig snoR  0.9776119

Upvotes: 1

Related Questions