Reputation: 143
I would like to calculate the mean value for each of my variables, and then I would like to create a list of the names of variables with the 3 largest mean values.
I will then use this list to subset my dataframe and will only include the 3 selected variables in additional analysis.
I'm close, but can't quite seem to write the code efficiently. And I'm trying to use pipes for the first time.
Here is a simplified dataset.
FA1 <- c(0.68, 0.79, 0.65, 0.72, 0.79, 0.78, 0.77, 0.67, 0.77, 0.7)
FA2 <- c(0.08, 0.12, 0.07, 0.13, 0.09, 0.12, 0.13, 0.08, 0.17, 0.09)
FA3 <- c(0.1, 0.06, 0.08, 0.09, 0.06, 0.08, 0.09, 0.09, 0.06, 0.08)
FA4 <- c(0.17, 0.11, 0.19, 0.13, 0.14, 0.14, 0.13, 0.16, 0.11, 0.16)
FA5 <- c(2.83, 0.9, 3.87, 1.55, 1.91, 1.46, 1.68, 2.5, 3.0, 1.45)
df <- data.frame(FA1, FA2, FA3, FA4, FA5)
And here is the piece of code I've written that doesn't quite get me what I want.
colMeans(df) %>% rank()
Upvotes: 0
Views: 59
Reputation: 16277
First identify the three columns with the highest means. I use colMeans
to calculate the column means. I then sort
the means by decreasing order and only keep the first three, which are the three largest.
three <-sort(colMeans(df),decreasing = TRUE)[1:3]
Then, keep only those columns.
df[,names(three)]
> df[,names(three)]
FA5 FA1 FA4
1 2.83 0.68 0.17
2 0.90 0.79 0.11
3 3.87 0.65 0.19
4 1.55 0.72 0.13
5 1.91 0.79 0.14
6 1.46 0.78 0.14
7 1.68 0.77 0.13
8 2.50 0.67 0.16
9 3.00 0.77 0.11
10 1.45 0.70 0.16
Upvotes: 3