Reputation:
I am doing some manipulations with dplyr. I am working with the brca data set. I have to find a solution for the below question.
" We are interested what variable might be the best indicator for the outcome malignant ("M") or benign ("B"). There are 30 features (variables) and we want to select one variable that has the largest difference between means for groups M and B."
Now i want to find the difference between the two resulting rows and then find the maximum difference and the resulting column name.
Can anyone help me with this?
Thanks... :)
Upvotes: 0
Views: 175
Reputation: 388817
To get column name and the value with the highest absolute difference between two rows you can do -
library(dplyr)
library(tidyr)
sumOutcome %>%
summarise(across(-outcome, diff)) %>%
pivot_longer(cols = everything()) %>%
slice(which.max(abs(value)))
# name value
# <chr> <dbl>
#1 concave_pts_worst 436.
Upvotes: 1