Reputation: 11
I have a data frame that looks like:
ID CO1 CO2 ED1 ED2 max
1 1 2 1 3 3
2 1 3 3 2 3
3 4 2 2 1 3
4 3 3 4 4 4
...
10 1 1 1 1 1
How do I get R to give me the name(s) of the columns that contain a particular number contanined in the colum max and assign them to a new column, named “best”?
I want something like this:
ID CO1 CO2 ED1 ED2 max best
1 1 2 1 3 3 ED2
2 1 3 3 2 3 CO2
3 4 2 2 1 4 CO1
4 3 3 4 4 4 ED1
...
10 1 1 1 1 1 CO2
In case there are more values equal to the one contained in the max column (as for example in row 2 or row 10), one at random is fine.
I have seen several solution to problems similar to this one, but none that effectively works in my case.
Upvotes: 1
Views: 319
Reputation: 5788
Long Base R solution with "best" vector containing the names of all of the best vectors:
# Store as a variable the names of the raw data vectors:
# dvecs => character vector
dvecs <- setdiff(names(df), c("ID", "max"))
# Store a matrix of booleans denoting if the column contains the max value:
# bool_test => logical matrix
bool_test <- df$max == df[,dvecs]
# Store a vector containing the names of the columns with the max values:
# best => character vector
df$best <- apply(
data.frame(
vapply(
seq_along(dvecs),
function(i) {
ifelse(bool_test[, i], dvecs[i], NA_character_)
},
character(nrow(bool_test))
)
),
1,
function(x) {
paste0(na.omit(x), collapse = ", ")
}
)
Upvotes: 0
Reputation: 388807
You can use max.col
:
cols <- grep('CO|ED', names(df), value = TRUE)
df$best <- cols[max.col(df[cols] == df$max)]
df
# ID CO1 CO2 ED1 ED2 max best
#1 1 1 2 1 3 3 ED2
#2 2 1 3 3 2 3 CO2
#3 3 4 2 2 1 4 CO1
#4 4 3 3 4 4 4 ED1
#5 10 1 1 1 1 1 ED2
You can check ties.method
in ?max.col
to get first/last match in each row.
data
df <- structure(list(ID = c(1L, 2L, 3L, 4L, 10L), CO1 = c(1L, 1L, 4L,
3L, 1L), CO2 = c(2L, 3L, 2L, 3L, 1L), ED1 = c(1L, 3L, 2L, 4L,
1L), ED2 = c(3L, 2L, 1L, 4L, 1L), max = c(3L, 3L, 4L, 4L, 1L)),
row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2
Reputation: 5419
No need to be overly fancy:
d <- read.table(text=
" ID CO1 CO2 ED1 ED2 max
1 1 2 1 3 3
2 1 3 3 2 3
3 4 2 2 1 3
4 3 3 4 4 4
10 1 1 1 1 1
", header=TRUE )
max.columns <- d %>% select(matches("CO|ED")) %>%
apply( 1, which.max )
d$best <- colnames(d)[ max.columns+1 ]
d
Outputs:
> d
ID CO1 CO2 ED1 ED2 max best
1 1 1 2 1 3 3 ED2
2 2 1 3 3 2 3 CO2
3 3 4 2 2 1 3 CO1
4 4 3 3 4 4 4 ED1
5 10 1 1 1 1 1 CO1
Upvotes: 0