Reputation: 1111
I'm looking for a way to create a new column with values based on the value in the first column, with the highest score in the same cluster.
The goal is to move from this:
WORD | SCORE | Cluster |
diptyqu | 19750 | 1 |
daptyqu | 8830 | 1 |
dyptiqu | 29 | 1 |
deptiqu | 2 | 1 |
parfume | 300 | 2 |
parfum | 14 | 2 |
To this:
WORD | SCORE | Cluster | Name |
diptyqu | 19750 | 1 | diptyqu |
daptyqu | 8830 | 1 | diptyqu |
dyptiqu | 29 | 1 | diptyqu |
deptiqu | 2 | 1 | diptyqu |
parfume | 300 | 2 | parfume |
parfum | 14 | 2 | parfume |
I'm litteraly stuck and don't really know how to make it. Do you have any suggestion on a method to do this?
Upvotes: 0
Views: 258
Reputation: 126
You can use 'data.table' packages and merge function:
library(data.table)
df2 <- setDT(df)[, .SD[which.max(SCORE)], Cluster]
result <- merge(df, df2[, c('Cluster', 'WORD')], by='Cluster')
Upvotes: 0
Reputation: 32538
do.call(rbind,
lapply(split(dat, dat$Cluster), function(a)
data.frame(a, Name = a$WORD[which.max(a$SCORE)])))
# WORD SCORE Cluster Name
#1.1 diptyqu 19750 1 diptyqu
#1.2 daptyqu 8830 1 diptyqu
#1.3 dyptiqu 29 1 diptyqu
#1.4 deptiqu 2 1 diptyqu
#2.5 parfume 300 2 parfume
#2.6 parfum 14 2 parfume
Upvotes: 1
Reputation: 28309
Similar strategy as in @Gregor's dplyr
solution, but using data.table
:
library(data.table)
setDT(d)
d[order(SCORE), Name := tail(WORD, 1), Cluster]
Upvotes: 2
Reputation: 145745
Using dplyr
your_data %>%
group_by(cluster) %>%
mutate(Name = first(WORD))
It assumes your rows are already sorted by SCORE. If needed, you can do that sorting too:
your_data %>%
group_by(cluster) %>%
arrange(desc(SCORE)) %>%
mutate(Name = first(WORD))
Upvotes: 2