Reputation: 1353
If I use mtcars as an example:
mtcars <- subset(mtcars, select = c("cyl", "disp"))
How can I add two additional columns, one that indicates that values are below/above the median and one that indicates which quartile the values are in? However, I would like this to be done per group of cyl
.
This is the specific outcome I am hoping for:
cyl disp median_split quartile_split
Toyota Corolla 4 71.1 below_median 1st_quartile
Honda Civic 4 75.7 below_median 1st_quartile
Fiat 128 4 78.7 below_median 1st_quartile
Fiat X1-9 4 79 below_median 2nd_quartile
Lotus Europa 4 95.1 below_median 2nd_quartile
Datsun 710 4 108 median median
Toyota Corona 4 120.1 above_median 3rd_quartile
Porsche 914-2 4 120.3 above_median 3rd_quartile
Volvo 142E 4 121 above_median 4th_quartile
Merc 230 4 140.8 above_median 4th_quartile
Merc 240D 4 146.7 above_median 4th_quartile
Ferrari Dino 6 145 below_median 1st_quartile
Mazda RX4 6 160 etc… etc…
I would appreciate help. Thank you.
Edit following on from akun's answer below
In the quartile_split
column, akun's answer left the lowest value in each cyl group with NA
. I thought I could fix this by adding:
mtcars$quartile_split[is.na(mtcars$quartile_split)] <- "1_quartile" #not a very elegant solution
So the full code would be:
library(dplyr)
mtcars <- subset(mtcars, select = c("cyl", "disp"))
# akrun's answer
mtcars <- mtcars %>%
group_by(cyl) %>%
mutate(median_split = c("above_median", "below_median")[1 +
(disp <= median(disp))],
quartile_split = cut(disp, breaks = quantile(disp),
labels = paste0(1:4, "_quartile")))
# addition
mtcars$quartile_split[is.na(mtcars$quartile_split)] <- "1_quartile" #not a very elegant solution
However, when I looked more carefully, I also found something else that did not seem quite right, specifically, when you look at the cyl = 6
group only, you see this:
cyl disp median_split quartile_split
6 145 below_median 1_quartile
6 160 below_median 1_quartile
6 160 below_median 1_quartile
6 167.6 below_median 2_quartile
6 167.6 below_median 2_quartile
6 225 above_median 4_quartile
6 258 above_median 4_quartile
The median disp
in this group is 163.8, so the two cars with disp = 167.6
should be classified as "above_median", instead of "below_median".
I hope this could be solved, somehow. Thank you again.
Upvotes: 0
Views: 1288
Reputation: 869
With base R and cut
:
mtcars <- subset(mtcars, select = c("cyl", "disp"))
mtcars$median_split <- ifelse(mtcars$disp <= median(mtcars$disp), "below_median","above_median")
mtcars$quantile_split <- cut(mtcars$disp, breaks = c(0, quantile(mtcars$disp)),labels = c("1_quartile",paste0(1:4, "_quartile")))
Be careful when using the cut
function to make sure the breaks include the minimum (or it will return NA), and that the min. is labelled in the 1st quartile.
Upvotes: 1
Reputation: 887223
An option would be to group by 'cyl', use cut
to create the different categories based on the quantile
on 'disp' column
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate(median_split = c("above_median", "below_median")[1 +
(disp <= median(disp))],
quartile_split = cut(disp, breaks = quantile(disp),
labels = paste0(1:4, "_quartile")))
Upvotes: 2