chiser
chiser

Reputation: 161

Ordering data frame by different factors

Hi I have a dataframe like this:

x <- data.frame("Group"= c(rep(letters[1],2),rep(letters[2],2),rep(letters[3],2),rep(letters[4],2),letters[5]) , "Treatment"=   c(rep(c("ATR","Co"),4),"Gen"), "Mean" = 1:9)

Want I want to do with it is order it in a custom way, namely move the rows of "Gen" Treatment to the first row. I could find them with something like:

if( "Genetic Control", idGroup$Treatment)
match("Genetic Control",idGroup$Treatment)
if(grepl("Gen",x$Treatment...

But I couldn´t find any function that moves the row above to the first row. Once I manage this, I want to order the rest of the data by the mean of x$mean of the two components of each group. For instance, in this case the mean of the two components of "d" is 7.5 and so it will go in the following rows after "Gen" treatment. In this example the dataframe should look like this:

  x <- data.frame("Group"= c(letters[5], rep(letters[4],2),rep(letters[3],2),rep(letters[2],2),rep(letters[1],2)), "Treatment"=   c("Gen", rep(c("ATR","Co"),4)), "Mean" = 9:1)

Upvotes: 0

Views: 58

Answers (2)

Jeff
Jeff

Reputation: 738

Here is an updated answer using the R package sqldf (click link for a brief intro).

library(sqldf)

x <- data.frame(
        "Group"= c(rep(letters[1],2),rep(letters[2],2),rep(letters[3],2),rep(letters[4],2),letters[5]), 
        "Treatment"=   c(rep(c("ATR","Co"),4),"Gen"), 
        "Mean" = 1:9)

Firstly, you can get the mean of column Mean for each group with this statement (similar to aggregate in R):

sqldf("
SELECT 
    `Group` AS `Group`, 
    AVG(`Mean`) AS `GroupMean` 
FROM x 
GROUP BY `Group`;")

  Group GroupMean
1     a       1.5
2     b       3.5
3     c       5.5
4     d       7.5
5     e       9.0

Then it is a case of using the JOIN statement (like merge in R) to join this table to the original one, putting 'Gen' at the top and then sorting by GroupMean. I call these these tables t1 and t2, join them together, and then select from them the columns I want, and sorting the table. I have formatted the query so hopefully it is easier to understand. It should help to read a couple of articles on MySQL or sqldf. Also, as per the comment by @G. Grothendieck, you can use ` to enclose column names and ' to enclose strings. Hope this helps.

sqldf("
SELECT 
    t1.`Group` AS `Group`, 
    t1.`Treatment` AS `Treatment`, 
    t1.`Mean` AS `Mean`, 
    t2.`GroupMean` AS `GroupMean` 
FROM
    (SELECT * FROM x) t1
    JOIN
    (SELECT 
        `Group` AS `Group`, 
        AVG(`Mean`) AS `GroupMean` 
    FROM x 
    GROUP BY `Group`) t2
    ON t1.`Group` = t2.`Group`
ORDER BY CASE `Treatment` WHEN 'Gen' THEN 1 ELSE 2 END, 
    `GroupMean` DESC, 
    `Mean` DESC;
")

  Group Treatment Mean GroupMean
1     e       Gen    9       9.0
2     d        Co    8       7.5
3     d       ATR    7       7.5
4     c        Co    6       5.5
5     c       ATR    5       5.5
6     b        Co    4       3.5
7     b       ATR    3       3.5
8     a        Co    2       1.5
9     a       ATR    1       1.5

Upvotes: 1

dimitris_ps
dimitris_ps

Reputation: 5951

Try this

x$Treatment <- ordered(x$Treatment, levels = c("Gen", "ATR", "Co")) 
x <- x[order(x$Treatment), ]

Define the levels in the order you want them Use unique(x$Treatment) to see the Treatments you have.

For the second part of your question i am not sure i understand. Try this and let me know if it works:

library(dplyr)

x$rank <- (x$Treatment!="Gen")*1
x <- x %>% group_by(Group) %>% mutate(temp=mean(Mean)) %>% 
      ungroup %>% arrange(rank, -temp) %>% select(-rank, -temp)

Upvotes: 0

Related Questions