Reputation: 1984
I have a data frame like
V1 V2 V3
1 1 1 2
2 0 1 0
3 3 0 3
....
and I have a vector of the same length as the number of rows in the data frame (it's the cluster
from kmeans, if that matters)
[1] 2 2 1...
From those I can get the colMeans
for each cluster, like
cm1 <- colMeans(df[fit$cluster==1,])
cm2 <- colMeans(df[fit$cluster==2,])
(I don't think I should do that part explicitly, but that's how I'm thinking about the problem.)
What I want is to get, for each column of the data frame, the value from the vector for which the colMeans is the maximum. Also I'd like to do (separately is fine) the second-highest, third, etc. So in the example I would want the output to be a vector with one element for each column of the data frame:
1 2 1...
because for the first column of the data frame, the column mean for the first cluster is 3, while the column mean for the second cluster is 0.5.
Upvotes: 1
Views: 166
Reputation: 887038
If the cluster
vector is of the same length as the number of rows of 'df', split
the data by the 'cluster' column into a list
,
lst1 <- lapply(split(df, fit$cluster), function(x) stack(colMeans(x)))
dat <- do.call(rbind, Map(cbind, cluster = names(lst1), lst1))
aggregate(values ~ ind, dat, FUN = which.max)
If we need to subset multiple element based on column means, create the 'cluster' column in the data, reshape to 'long' format (or use summarise/across
), grouped by 'cluster', 'name', get the mean
of 'value', arrange
the column 'name' and the 'value' in desc
ending order, then return the n
rows with slice_head
library(dplyr)
library(tidyr)
df %>%
mutate(cluster = fit$cluster) %>%
pivot_longer(cols = -cluster) %>%
group_by(cluster, name) %>%
summarise(value = mean(value), .groups = 'drop') %>%
arrange(name, desc(value)) %>%
group_by(name) %>%
slice_head(n = 2)
df <- structure(list(V1 = c(1L, 0L, 3L), V2 = c(1L, 1L, 0L), V3 = c(2L,
0L, 3L)), class = "data.frame", row.names = c("1", "2", "3"))
fit <- structure(list(cluster = c(2, 2, 1)), class = "data.frame",
row.names = c(NA,
-3L))
Upvotes: 1