Reputation: 533
I'm very new to R so I hope my question will be interesting. What I want to do is quite straightforward. Here's a sample of my dataset:
> head(belongliness)
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z Event cluster1 cluster2 cluster3 cluster4
1: 40 47 62 Head-up 0.1900989 0.768225365 0.0160654667 0.025610279
2: 60 74 95 Head-up 0.5392218 0.038558310 0.0064671635 0.415752686
3: 62 63 88 Head-up 0.7953673 0.044981152 0.0067121719 0.152939414
4: 60 56 82 Head-up 0.9941016 0.002608879 0.0003007537 0.002988748
5: 66 61 90 Head-up 0.7027407 0.048318016 0.0079239680 0.241017291
6: 60 53 80 Head-up 0.9541378 0.023338896 0.0024442116 0.020079071
I would like to create a new column "winning cluster"
to the right side of column "cluster 4"
. Column "winning cluster"
will take the highest value among columns "cluster 1"
to "cluster 4"
for each row and display the index name of that column.
For row 1 that will be cluster 2
, for row 2 cluster 1
, for row 3 cluster 1
etc.
Any help is appreciated!
Upvotes: 1
Views: 67
Reputation: 5776
In basic R, this is easily done:
belongliness$`winning cluster` = apply(belongliness[,5:8], 1, max)
where belongliness[,5:8]
corresponds to columns cluster1
through cluster4
.
Or if you wanted the index,
belongliness$`winning cluster` = apply(belongliness[,5:8], 1, which.max)
belongliness$`winning cluster` = paste0('cluster', belongliness$`winning cluster`)
Edit: the right hand side of the first line is essentially max.col
:
belongliness$`winning cluster` = max.col(belongliness[,5:8])
Upvotes: 1
Reputation: 887821
If the dataset is a data.table
class, specify the columns of interest in .SDcols
, get the column index of highest value in each row with max.col
, use that to select the column name and assign (:=
) as 'winning_cluster'
library(data.table)
belongliness[, winning_cluster := names(.SD)[max.col(.SD)],
.SDcols = cluster1:cluster4]
Upvotes: 2