Reputation: 161
Hi I have a dataframe like this:
x <- data.frame("Group"= c(rep(letters[1],2),rep(letters[2],2),rep(letters[3],2),rep(letters[4],2),letters[5]) , "Treatment"= c(rep(c("ATR","Co"),4),"Gen"), "Mean" = 1:9)
Want I want to do with it is order it in a custom way, namely move the rows of "Gen" Treatment to the first row. I could find them with something like:
if( "Genetic Control", idGroup$Treatment)
match("Genetic Control",idGroup$Treatment)
if(grepl("Gen",x$Treatment...
But I couldn´t find any function that moves the row above to the first row. Once I manage this, I want to order the rest of the data by the mean of x$mean of the two components of each group. For instance, in this case the mean of the two components of "d" is 7.5 and so it will go in the following rows after "Gen" treatment. In this example the dataframe should look like this:
x <- data.frame("Group"= c(letters[5], rep(letters[4],2),rep(letters[3],2),rep(letters[2],2),rep(letters[1],2)), "Treatment"= c("Gen", rep(c("ATR","Co"),4)), "Mean" = 9:1)
Upvotes: 0
Views: 58
Reputation: 738
Here is an updated answer using the R package sqldf (click link for a brief intro).
library(sqldf)
x <- data.frame(
"Group"= c(rep(letters[1],2),rep(letters[2],2),rep(letters[3],2),rep(letters[4],2),letters[5]),
"Treatment"= c(rep(c("ATR","Co"),4),"Gen"),
"Mean" = 1:9)
Firstly, you can get the mean of column Mean
for each group with this statement (similar to aggregate
in R):
sqldf("
SELECT
`Group` AS `Group`,
AVG(`Mean`) AS `GroupMean`
FROM x
GROUP BY `Group`;")
Group GroupMean
1 a 1.5
2 b 3.5
3 c 5.5
4 d 7.5
5 e 9.0
Then it is a case of using the JOIN
statement (like merge
in R) to join this table to the original one, putting 'Gen' at the top and then sorting by GroupMean
. I call these these tables t1
and t2
, join them together, and then select from them the columns I want, and sorting the table. I have formatted the query so hopefully it is easier to understand. It should help to read a couple of articles on MySQL
or sqldf
. Also, as per the comment by @G. Grothendieck, you can use ` to enclose column names and ' to enclose strings. Hope this helps.
sqldf("
SELECT
t1.`Group` AS `Group`,
t1.`Treatment` AS `Treatment`,
t1.`Mean` AS `Mean`,
t2.`GroupMean` AS `GroupMean`
FROM
(SELECT * FROM x) t1
JOIN
(SELECT
`Group` AS `Group`,
AVG(`Mean`) AS `GroupMean`
FROM x
GROUP BY `Group`) t2
ON t1.`Group` = t2.`Group`
ORDER BY CASE `Treatment` WHEN 'Gen' THEN 1 ELSE 2 END,
`GroupMean` DESC,
`Mean` DESC;
")
Group Treatment Mean GroupMean
1 e Gen 9 9.0
2 d Co 8 7.5
3 d ATR 7 7.5
4 c Co 6 5.5
5 c ATR 5 5.5
6 b Co 4 3.5
7 b ATR 3 3.5
8 a Co 2 1.5
9 a ATR 1 1.5
Upvotes: 1
Reputation: 5951
Try this
x$Treatment <- ordered(x$Treatment, levels = c("Gen", "ATR", "Co"))
x <- x[order(x$Treatment), ]
Define the levels
in the order you want them
Use unique(x$Treatment)
to see the Treatments you have.
For the second part of your question i am not sure i understand. Try this and let me know if it works:
library(dplyr)
x$rank <- (x$Treatment!="Gen")*1
x <- x %>% group_by(Group) %>% mutate(temp=mean(Mean)) %>%
ungroup %>% arrange(rank, -temp) %>% select(-rank, -temp)
Upvotes: 0