Add a new column to a dataframe based on results from other columns

Question

I'm very new to R so I hope my question will be interesting. What I want to do is quite straightforward. Here's a sample of my dataset:

> head(belongliness)
   ACTIVITY_X ACTIVITY_Y ACTIVITY_Z   Event  cluster1    cluster2     cluster3    cluster4
1:         40         47         62 Head-up 0.1900989 0.768225365 0.0160654667 0.025610279
2:         60         74         95 Head-up 0.5392218 0.038558310 0.0064671635 0.415752686
3:         62         63         88 Head-up 0.7953673 0.044981152 0.0067121719 0.152939414
4:         60         56         82 Head-up 0.9941016 0.002608879 0.0003007537 0.002988748
5:         66         61         90 Head-up 0.7027407 0.048318016 0.0079239680 0.241017291
6:         60         53         80 Head-up 0.9541378 0.023338896 0.0024442116 0.020079071

I would like to create a new column "winning cluster" to the right side of column "cluster 4". Column "winning cluster" will take the highest value among columns "cluster 1" to "cluster 4" for each row and display the index name of that column.

For row 1 that will be cluster 2, for row 2 cluster 1, for row 3 cluster 1 etc.

Any help is appreciated!

akrun · Accepted Answer

If the dataset is a data.table class, specify the columns of interest in .SDcols, get the column index of highest value in each row with max.col, use that to select the column name and assign (:=) as 'winning_cluster'

library(data.table)
belongliness[, winning_cluster := names(.SD)[max.col(.SD)], 
           .SDcols = cluster1:cluster4]

Add a new column to a dataframe based on results from other columns

Answers (2)

Related Questions