Leosar
Leosar

Reputation: 2072

Reclassify a vector based in another vector R

I have two vectors, or two columns of a data.frame and I want to the first column represent values let's say A and the second groups G, I want to change the groups based on the maximum values of A by G, then I want to change the values of G so that the group are ordered by max A. For example

df <-"A  G
1.0 1
2.0 1
2.6 2
1.0 2
2.0 2
1.0 3
2.3 3"

df <- read.table(textConnection(df), header=TRUE)

aggregate(df$A, by=list(df$G), max)

then max by group are 2 2.6 2.3, and group 3 have to become 2, group 2 -> 3, and group 1 stays the same.

result should be

 A  G
1.0 1
2.0 1
2.6 3
1.0 3
2.0 3
1.0 2
2.3 2

It's easy to use agreggate or dplyr to obtain the max

require(dplyr)

df %>% group_by(G) %>% summarise(Amax=max(A))

but I don't figure out how to change the groups.

Upvotes: 4

Views: 514

Answers (3)

PavoDive
PavoDive

Reputation: 6496

We could join df with a grouped - ordered - renamed version of itself, and get the new group:

library(data.table)

setDT(df)

df[df[, max(A), by = G][order(-V1)][, newG := .I], on = "G"]

     A G  V1 newG
1: 2.6 2 2.6    1
2: 1.0 2 2.6    1
3: 2.0 2 2.6    1
4: 1.0 3 2.3    2
5: 2.3 3 2.3    2
6: 1.0 1 2.0    3
7: 2.0 1 2.0    3

What is in there:

df[, max(A), by = G] gives you a data.table of the max values of A in each group G.

[order(-V1)] sorts the values of max(A) (stored as V1) in descending order.

[, newG := .I] assings to variable newG the row number.

X[Y, on = "Z"] is the syntax for table joins in data.table (Join tables X and Y on variable Z)

If you want to remove V1, then it's achieved by appending to the end of the chained command: [, V1 := NULL].

Upvotes: 0

tushaR
tushaR

Reputation: 3116

Using data.table and match:

df=as.data.table(df)
df[,maxg:=max(A),by=G][,G:=match(maxg,sort(unique(maxg)))][,maxg=NULL]

Upvotes: 1

Iroha
Iroha

Reputation: 34761

You could use dense_rank() on the max value by group. Although it's not clear what the result should be if max values are tied.

library(dplyr)

df %>% 
  mutate(G = dense_rank(ave(A, G, FUN = max)))

    A G
1 1.0 1
2 2.0 1
3 2.6 3
4 1.0 3
5 2.0 3
6 1.0 2
7 2.3 2

Upvotes: 2

Related Questions