Jrem52
Jrem52

Reputation: 13

How can I add a column sorting my data by above or below the median?

I am trying to add a column to my data (called boston) that states if each entry is above or below the median crime rate (variable called crim) I have found the median by median(boston$crim) but now I need to add a column that states if the crime rate is above or below that number.

Upvotes: 1

Views: 277

Answers (2)

TarJae
TarJae

Reputation: 78927

Here is an example with the iris dataset using the dplyr package. First select only Sepal.Length column, then mutate the median in a new coulmn and then use ifelse to set above or below.

library(dplyr)

iris %>% 
  select(Sepal.Length) %>% 
  mutate(Sepal.Length.median = median(Sepal.Length),
         Sepal.Length.above.below = ifelse(Sepal.Length.median > Sepal.Length.median, "above", "below")
  ) %>% 
  head()

Output:

 Sepal.Length Sepal.Length.median Sepal.Length.above.belos
1          5.1                 5.8                    below
2          4.9                 5.8                    below
3          4.7                 5.8                    below
4          4.6                 5.8                    below
5          5.0                 5.8                    below
6          5.4                 5.8                    below

Upvotes: 1

Socrates
Socrates

Reputation: 142

You can use package dplyrand case_when clause:

library(dplyr)

boston <- boston %>%
  mutate(med_crim = median(crim, na.rm = TRUE)) %>%
  mutate(
    above_or_below = case_when(
      crim > med_crim ~ "above",
      crim < med_crim ~ "below",
      TRUE ~ "equal"),
##You can also create a variable with the difference to the median:
    diff_to_median = crim - med_crim)

Upvotes: 0

Related Questions