Reputation: 353
I am fairly knew to R I have data that looks like this
dataset:
a b c d
r1 1 3 4 6
r2 12 13 11 4
r3 12 94 12 0
r4 0 2 5 0
r5 3 1 4 1
I would like to know the column that has the highest value in each row
r1: d
r2: b
r3: b
r4: c
r5: c
Also, how would I extend this, if I had a larger dataset, and if I wanted to find the largest 5 columns (in order) and lowest 5 columns (in order)
Upvotes: 0
Views: 171
Reputation: 43334
Subsetting names with max.col
is handy:
# a matrix makes sense for this data
x <- structure(c(1L, 12L, 12L, 0L, 3L, 3L, 13L, 94L, 2L, 1L, 4L, 11L,
12L, 5L, 4L, 6L, 4L, 0L, 0L, 1L), .Dim = c(5L, 4L), .Dimnames = list(
c("r1", "r2", "r3", "r4", "r5"), c("a", "b", "c", "d")))
# column name of row maximum
colnames(x)[max.col(x)]
#> [1] "d" "b" "b" "c" "c"
# column name of row minimum; note ties return the first occurrence
colnames(x)[max.col(-x)]
#> [1] "a" "d" "d" "a" "d"
# row name of column maximum
rownames(x)[max.col(t(x))]
#> [1] "r2" "r3" "r3" "r1"
Upvotes: 1
Reputation: 32548
use apply
to check which element is the maximum in each row and then obtain the corresponding column name
apply(df, 1, function(x) colnames(df)[which.max(x)])
# r1 r2 r3 r4 r5
#"d" "b" "b" "c" "c"
For columns corresponding to top two values
apply(X = df, MARGIN = 1, function(x)
colnames(df)[order(x, decreasing = TRUE)[1:2]]) # = FALSE for lowest two values
# r1 r2 r3 r4 r5
#[1,] "d" "b" "b" "c" "c"
#[2,] "c" "a" "a" "b" "a"
DATA
df = structure(list(a = c(1L, 12L, 12L, 0L, 3L), b = c(3L, 13L, 94L,
2L, 1L), c = c(4L, 11L, 12L, 5L, 4L), d = c(6L, 4L, 0L, 0L, 1L
)), .Names = c("a", "b", "c", "d"), class = "data.frame", row.names = c("r1",
"r2", "r3", "r4", "r5"))
Upvotes: 4