Reputation: 99
I have a data set with subject ID's where I'm taking hourly measurements based on different criteria ("metrics"). Each of these metrics either does or does not meet compliance (1 for yes, 0 for no). I have two goals:
Tasks
(1) Determine "full compliance" i.e. a subject has 1's for every metric for every hour. If there is a zero, the subject fails to meet "full compliance".
(2) Determine the percentage of compliance for each subject. I.e. if a subject has 3/4 hours of all 1's then they are 75% compliant.
I would like to solve this using dplyr
and combinations of filter()
or some aggregate function I may not be familiar with.
Example
Here is an example dataset:
df <- data.frame(
"Subject ID" = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"),
"Metric 1" = rep("value", 12),
"Compliance 1" = rep(1, 12),
"Metric 2" = rep("value", 12),
"Compliance 2" = c(1,1,1,1,1,0,1,1,1,1,1,1),
"Metric 3" = rep("value", 12),
"Compliance 3" = c(1,1,1,1,0,1,1,1,1,1,1,1)
)
And here is what I would like to use dplyr
to achieve for point (1):
df2 <- data.frame(
"Subject ID" = c("A", "A", "A", "A", "C", "C", "C", "C"),
"Metric 1" = rep("value", 8),
"Compliance 1" = rep(1, 8),
"Metric 2" = rep("value", 8),
"Compliance 2" = rep(1, 8),
"Metric 3" = rep("value", 8),
"Compliance 3" = rep(1, 8)
)
For (2) I would like to aggregate the data in a way to show that Subject (A) had 100% compliance but (B) had 50% compliance.
Edited Answer
Thanks to tmfmnk I was able to get to the condensed percentages using:
df %>%
mutate(cond = rowMeans(select(., starts_with("Compliance"))) == 1) %>%
group_by(Subject.ID) %>%
tally(cond) %>%
mutate(perc = n/4)
Subject.ID n perc
<fct> <int> <dbl>
1 A 4 1
2 B 2 0.5
3 C 4 1
Upvotes: 0
Views: 245
Reputation: 39858
One possibility could be:
df %>%
mutate(cond = rowMeans(select(., starts_with("Compliance"))) == 1) %>%
group_by(Subject.ID) %>%
filter(all(cond)) %>%
select(-cond)
Subject.ID Metric.1 Compliance.1 Metric.2 Compliance.2 Metric.3 Compliance.3
1 A value 1 value 1 value 1
2 A value 1 value 1 value 1
3 A value 1 value 1 value 1
4 A value 1 value 1 value 1
5 C value 1 value 1 value 1
6 C value 1 value 1 value 1
7 C value 1 value 1 value 1
8 C value 1 value 1 value 1
If you want the percentage, then you can do:
df %>%
mutate(cond = rowMeans(select(., starts_with("Compliance"))))
Subject.ID Metric.1 Compliance.1 Metric.2 Compliance.2 Metric.3 Compliance.3 cond
1 A value 1 value 1 value 1 1.0000000
2 A value 1 value 1 value 1 1.0000000
3 A value 1 value 1 value 1 1.0000000
4 A value 1 value 1 value 1 1.0000000
5 B value 1 value 1 value 0 0.6666667
6 B value 1 value 0 value 1 0.6666667
7 B value 1 value 1 value 1 1.0000000
8 B value 1 value 1 value 1 1.0000000
9 C value 1 value 1 value 1 1.0000000
10 C value 1 value 1 value 1 1.0000000
11 C value 1 value 1 value 1 1.0000000
12 C value 1 value 1 value 1 1.0000000
To get a percentage of cases fulfilling the condition per group (as proposed by @rsh52):
df %>%
mutate(cond = rowMeans(select(., starts_with("Compliance"))) == 1) %>%
group_by(Subject.ID) %>%
tally(cond) %>%
mutate(perc = n/4)
Subject.ID n perc
<fct> <int> <dbl>
1 A 4 1
2 B 2 0.5
3 C 4 1
Upvotes: 2
Reputation: 1114
This answer is a bit lengthy, but the logic should be easy enough to follow -- sum the 1's per compliance column, calculate the compliance percentage, remove subjects with <100% compliance on the 3 compliance columns. Then filter out those subjects from the original dataframe.
compliant <- df %>% select(Subject.ID,matches('Compliance')) %>%
gather(k,v,-Subject.ID) %>%
group_by(Subject.ID,k) %>%
summarise(sum=sum(v),n=n()) %>%
group_by(Subject.ID,k) %>%
mutate(perc=sum/sum(n)) %>%
group_by(Subject.ID) %>% summarise(c=sum(perc)) %>%
filter(c==3)
# A tibble: 2 x 2
Subject.ID c
<fct> <dbl>
1 A 3
2 C 3
df %>% filter(Subject.ID %in% compliant$Subject.ID)
Subject.ID Metric.1 Compliance.1 Metric.2 Compliance.2 Metric.3 Compliance.3
1 A value 1 value 1 value 1
2 A value 1 value 1 value 1
3 A value 1 value 1 value 1
4 A value 1 value 1 value 1
5 C value 1 value 1 value 1
6 C value 1 value 1 value 1
7 C value 1 value 1 value 1
8 C value 1 value 1 value 1
Upvotes: 1