Reputation: 2072
I have two vectors, or two columns of a data.frame and I want to the first column represent values let's say A and the second groups G, I want to change the groups based on the maximum values of A by G, then I want to change the values of G so that the group are ordered by max A. For example
df <-"A G
1.0 1
2.0 1
2.6 2
1.0 2
2.0 2
1.0 3
2.3 3"
df <- read.table(textConnection(df), header=TRUE)
aggregate(df$A, by=list(df$G), max)
then max by group are 2 2.6 2.3, and group 3 have to become 2, group 2 -> 3, and group 1 stays the same.
result should be
A G
1.0 1
2.0 1
2.6 3
1.0 3
2.0 3
1.0 2
2.3 2
It's easy to use agreggate or dplyr to obtain the max
require(dplyr)
df %>% group_by(G) %>% summarise(Amax=max(A))
but I don't figure out how to change the groups.
Upvotes: 4
Views: 514
Reputation: 6496
We could join df
with a grouped - ordered - renamed version of itself, and get the new group:
library(data.table)
setDT(df)
df[df[, max(A), by = G][order(-V1)][, newG := .I], on = "G"]
A G V1 newG
1: 2.6 2 2.6 1
2: 1.0 2 2.6 1
3: 2.0 2 2.6 1
4: 1.0 3 2.3 2
5: 2.3 3 2.3 2
6: 1.0 1 2.0 3
7: 2.0 1 2.0 3
What is in there:
df[, max(A), by = G]
gives you a data.table of the max values of A in each group G.
[order(-V1)]
sorts the values of max(A)
(stored as V1) in descending order.
[, newG := .I]
assings to variable newG
the row number.
X[Y, on = "Z"]
is the syntax for table joins in data.table (Join tables X and Y on variable Z)
If you want to remove V1
, then it's achieved by appending to the end of the chained command: [, V1 := NULL]
.
Upvotes: 0
Reputation: 3116
Using data.table
and match
:
df=as.data.table(df)
df[,maxg:=max(A),by=G][,G:=match(maxg,sort(unique(maxg)))][,maxg=NULL]
Upvotes: 1
Reputation: 34761
You could use dense_rank()
on the max value by group. Although it's not clear what the result should be if max values are tied.
library(dplyr)
df %>%
mutate(G = dense_rank(ave(A, G, FUN = max)))
A G
1 1.0 1
2 2.0 1
3 2.6 3
4 1.0 3
5 2.0 3
6 1.0 2
7 2.3 2
Upvotes: 2