Reputation: 311
I have the following data frame and would like to introduce a dummy if a value is above the group's median.
df<-data.frame(group=rep(c("A","B","c"),3), value1=c(1:9))
m<-aggregate(. ~ group, data=df, FUN=median)
names(m)[2]<-"median"
df<-merge(df,m, by="group", all.x = T)
df$median_0_1<-ifelse(df$median<df$value1,1,0)
Is there a more elegant way to do this?
And, can i adjust this to set the dummy above or below third quartile?
And, is this a robust way, that will work reliably?
Thanks a lot.
Upvotes: 1
Views: 985
Reputation: 72683
Elegance lies in the eye of the beholder, but how do you like this.
df <- within(df, {
median <- ave(value1, group, FUN=median)
median_0_1 <- ifelse(median < value1, 1, 0)
quantile3 <- ave(value1, group, FUN=function(x) quantile(x, probs=.3))
quantile_0_1 <- ifelse(quantile3 < value1, 1, 0)
})
df
# group value1 quantile_0_1 quantile3 median_0_1 median
# 1 A 1 0 2.8 0 4
# 2 B 2 0 3.8 0 5
# 3 c 3 0 4.8 0 6
# 4 A 4 1 2.8 0 4
# 5 B 5 1 3.8 0 5
# 6 c 6 1 4.8 0 6
# 7 A 7 1 2.8 1 4
# 8 B 8 1 3.8 1 5
# 9 c 9 1 4.8 1 6
Upvotes: 1
Reputation: 388907
When you want to keep number of rows in dataframe same use ave
df$median_0_1 <- with(df, as.integer(value1 > ave(value1, group, FUN = median)))
This can also be done with dplyr
library(dplyr)
df %>% group_by(group) %>% mutate(median_0_1 = as.integer(value1 > median(value1)))
Or data.table
library(data.table)
setDT(df)[, median_0_1 := as.integer(value1 > median(value1)), group]
Upvotes: 2