MaxMiak
MaxMiak

Reputation: 197

mutiple variable outliers with different condition

With regards hope you all are doing well. I have a data frame as below I want to flag different variables as an outlier with different conditions. Basically, I want to check some columns that have values more than threshold as I mention them in the bottom I will group_by id and flag those outlies and select id,question_name, outlier value as text_answer and count number of occurrences of particular value entered by a particular id. The problem is I can get only the required output for a single variable with my code but I want to have all of them at one go.

raw_data <- data.frame(
id=c(1,2,1,2,3,4,4,3),
shelter_number=c(3,2,3,2,11,15,15,11),
question_name=c("A","B","C","A"),
size=c(12,11,12,10,9,15,18,14),
shelter_age=c(20,35,50,55,62,44,70,65)
)

What I am trying is

raw_data <- raw_data %>% filter(shelter_number>=10 ) 

outliers <- raw_data %>% select(id,shelter_number) %>% 
  pivot_longer(cols = -id,
               names_to = "Question_name", values_to= "Value", 
               values_drop_na = TRUE) %>%
  count(id, Question_name, Value) %>% rename(text_answer=Value,count=n)

What I want is as below

expected_output <- data.frame(
  id=c(3,4,1,2,2,3,4,4,1,2,3,3,4),
  question_name=c("shelter","shelter","size","size","size","size","size","size","shelter_age","shelter_age","shelter_age","shelter_age","shelter_age"),
  text_answer=c(11,15,12,10,11,14,15,18,50,55,62,65,70),
  count=c(2,2,2,1,1,1,1,1,1,1,1,1,1)
)

Thresholds are as below

shelter_age >50  
shelter_number>10  
size>10

Thanks in advance

Upvotes: 0

Views: 30

Answers (1)

ekoam
ekoam

Reputation: 8844

Is this what you want?

raw_data %>% 
  select(-question_name) %>% 
  pivot_longer(-id, names_to = "question_name", values_to = "text_answer") %>% 
  filter(
    question_name == "shelter_number" & text_answer > 10 |
      question_name == "shelter_age" & text_answer > 50 |
      question_name == "size" & text_answer > 10
  ) %>% 
  count(id, question_name, text_answer, name = "count")

Upvotes: 1

Related Questions