F.Johann
F.Johann

Reputation: 25

Factor levels reaching certain values

I need to find out how many factor levels reach values of a continuous variable.

The code below produces the desired result for the example data, but it is rather an awkward work around.

My real dataframe is much larger and the real plot should show more values (or is continuous) on the x-axis. I would appreciate an applicable code a lot.

set.seed(5)   
df <- data.frame(ID = factor(c("a","a","b","c","d","e","e")),values = runif(7,0,6))
seq <- 1:5 
length.unique <- function(x) length(unique(x))

sub1 <- df[which(df$values >= 1), ]
sub2 <- df[which(df$values >= 2), ]
sub3 <- df[which(df$values >= 3), ]
sub4 <- df[which(df$values >= 4), ]
sub5 <- df[which(df$values >= 5), ]

N_IDs <- c(length.unique(sub1$ID),length.unique(sub2$ID),length.unique(sub3$ID),length.unique(sub4$ID),length.unique(sub5$ID))
plot(N_IDs ~ seq, type="b")

Upvotes: 1

Views: 44

Answers (2)

eddi
eddi

Reputation: 49448

Using non-equi joins:

library(data.table)
setDT(df)

df[.(seq = 1:5), on = .(values >= seq), allow = T, .(N_IDs = uniqueN(ID)), by = .EACHI]
#   values N_IDs
#1:      1     4
#2:      2     3
#3:      3     3
#4:      4     3
#5:      5     1

Upvotes: 1

MrFlick
MrFlick

Reputation: 206401

Using tidyverse, you can save some time by first calculating the max value for each ID,

library(tidyverse)
idmax <- df %>% group_by(ID) %>% summarize(max=max(values)) %>% pull(max)

Then for each cut point, return the count that pass

map_df(1:5, ~data.frame(cut=., count=sum(idmax >.)))
#   cut count
# 1   1     4
# 2   2     3
# 3   3     3
# 4   4     3
# 5   5     1

Upvotes: 1

Related Questions