Reputation: 197
With regards hope you all are doing well. I have a data frame as below I want to flag different variables as an outlier with different conditions. Basically, I want to check some columns that have values more than threshold as I mention them in the bottom I will group_by id and flag those outlies and select id,question_name, outlier value as text_answer and count number of occurrences of particular value entered by a particular id. The problem is I can get only the required output for a single variable with my code but I want to have all of them at one go.
raw_data <- data.frame(
id=c(1,2,1,2,3,4,4,3),
shelter_number=c(3,2,3,2,11,15,15,11),
question_name=c("A","B","C","A"),
size=c(12,11,12,10,9,15,18,14),
shelter_age=c(20,35,50,55,62,44,70,65)
)
What I am trying is
raw_data <- raw_data %>% filter(shelter_number>=10 )
outliers <- raw_data %>% select(id,shelter_number) %>%
pivot_longer(cols = -id,
names_to = "Question_name", values_to= "Value",
values_drop_na = TRUE) %>%
count(id, Question_name, Value) %>% rename(text_answer=Value,count=n)
What I want is as below
expected_output <- data.frame(
id=c(3,4,1,2,2,3,4,4,1,2,3,3,4),
question_name=c("shelter","shelter","size","size","size","size","size","size","shelter_age","shelter_age","shelter_age","shelter_age","shelter_age"),
text_answer=c(11,15,12,10,11,14,15,18,50,55,62,65,70),
count=c(2,2,2,1,1,1,1,1,1,1,1,1,1)
)
Thresholds are as below
shelter_age >50
shelter_number>10
size>10
Thanks in advance
Upvotes: 0
Views: 30
Reputation: 8844
Is this what you want?
raw_data %>%
select(-question_name) %>%
pivot_longer(-id, names_to = "question_name", values_to = "text_answer") %>%
filter(
question_name == "shelter_number" & text_answer > 10 |
question_name == "shelter_age" & text_answer > 50 |
question_name == "size" & text_answer > 10
) %>%
count(id, question_name, text_answer, name = "count")
Upvotes: 1