juansalix
juansalix

Reputation: 533

Add a new column to a dataframe based on results from other columns

I'm very new to R so I hope my question will be interesting. What I want to do is quite straightforward. Here's a sample of my dataset:

> head(belongliness)
   ACTIVITY_X ACTIVITY_Y ACTIVITY_Z   Event  cluster1    cluster2     cluster3    cluster4
1:         40         47         62 Head-up 0.1900989 0.768225365 0.0160654667 0.025610279
2:         60         74         95 Head-up 0.5392218 0.038558310 0.0064671635 0.415752686
3:         62         63         88 Head-up 0.7953673 0.044981152 0.0067121719 0.152939414
4:         60         56         82 Head-up 0.9941016 0.002608879 0.0003007537 0.002988748
5:         66         61         90 Head-up 0.7027407 0.048318016 0.0079239680 0.241017291
6:         60         53         80 Head-up 0.9541378 0.023338896 0.0024442116 0.020079071

I would like to create a new column "winning cluster" to the right side of column "cluster 4". Column "winning cluster" will take the highest value among columns "cluster 1" to "cluster 4" for each row and display the index name of that column.

For row 1 that will be cluster 2, for row 2 cluster 1, for row 3 cluster 1 etc.

Any help is appreciated!

Upvotes: 1

Views: 67

Answers (2)

MrGumble
MrGumble

Reputation: 5776

In basic R, this is easily done:

belongliness$`winning cluster` = apply(belongliness[,5:8], 1, max)

where belongliness[,5:8] corresponds to columns cluster1 through cluster4.

Or if you wanted the index,

belongliness$`winning cluster` = apply(belongliness[,5:8], 1, which.max)
belongliness$`winning cluster` = paste0('cluster', belongliness$`winning cluster`)

Edit: the right hand side of the first line is essentially max.col:

belongliness$`winning cluster` = max.col(belongliness[,5:8])

Upvotes: 1

akrun
akrun

Reputation: 887821

If the dataset is a data.table class, specify the columns of interest in .SDcols, get the column index of highest value in each row with max.col, use that to select the column name and assign (:=) as 'winning_cluster'

library(data.table)
belongliness[, winning_cluster := names(.SD)[max.col(.SD)], 
           .SDcols = cluster1:cluster4]

Upvotes: 2

Related Questions