user3032689
user3032689

Reputation: 667

Compare column with median

I have a data table like the following:

TDT <- data.table(Group = c(rep("A",40),rep("B",60)),
                      Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Date = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      x1 = sample(100,100))

I calculate the median of x1 as follows:

TDT2 <- TDT[, median(x1), by = .(Group,Date)]

My question is: How can I compare each value of x1 in TDT with the resulting median per Group and Date? For example if it is lower, TRUE should result. I know one way with a nested for loop over Group and Date, but this takes very long on a big data set. What I wonder is if there is a more datatable'ish way that makes use of by maybe?

Upvotes: 1

Views: 142

Answers (2)

akrun
akrun

Reputation: 887118

Here is an option using tidyverse

 library(tidyverse)
 TDT %>%
      group_by(Group, Id) %>%
      mutate(median_x1 = median(x1, na.rm = TRUE), below_median_x1 = x1 < median_x1)

Upvotes: 2

Bulat
Bulat

Reputation: 6969

You can use := to add new columns to the data.table:

TDT <- data.table(Group = c(rep("A",40),rep("B",60)),
                  Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                  Date = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                  x1 = sample(100,100))

# add median within groups
TDT[, median.x1 := as.numeric(median(x1, na.rm = T)), by = .(Group, Date)]
# compare original values to the median
TDT[, bellow.median.x1 := x1 < median.x1]

Upvotes: 2

Related Questions