Reputation: 251
Using the example data provided below: For each institution type ("a" and "b") I want to drop rows with fac == "no" if there exists a row with fac == "yes" for the same year. I then want to sum the values by year. I am, however, not able to figure out how to drop the correct "no"-rows. Below are a couple of my attempts based on answers give here.
set.seed(123)
ext <- tibble(
institution = c(rep("a", 7), rep("b", 7)),
year = rep(c("2005", "2005", "2006", "2007", "2008", "2009", "2009"), 2),
fac = rep(c("yes", "no", "no", "no", "no", "yes", "no"), 2),
value = sample(1:100, 14, replace=T)
)
ext %>%
group_by(institution, year) %>%
filter(if (fac == "yes") fac != "no")
ext %>%
group_by(institution, year) %>%
case_when(fac == "yes" ~ filter(., fac != "no"))
ext %>%
group_by(institution, year) %>%
{if (fac == "yes") filter(., fac != "no")}
Upvotes: 4
Views: 1529
Reputation: 887118
An option with data.table
library(data.table)
setDT(ext)[ext[, .I[fac == 'yes'|all(fac == 'no')], .(institution, year)]$V1]
Upvotes: 1
Reputation: 3269
Another way would be:
library(dplyr)
ext %>%
group_by(institution, year) %>%
filter(fac == 'yes' | n() < 2)
# institution year fac value
# 1 a 2005 yes 31
# 2 a 2006 no 51
# 3 a 2007 no 14
# 4 a 2008 no 67
# 5 a 2009 yes 42
# 6 b 2005 yes 43
# 7 b 2006 no 25
# 8 b 2007 no 90
# 9 b 2008 no 91
# 10 b 2009 yes 69
In case you want the overall amounts by year, add these two lines, which will yield the following output:
group_by(year) %>%
summarise(value=sum(value))
# year value
# <chr> <int>
# 1 2005 74
# 2 2006 76
# 3 2007 104
# 4 2008 158
# 5 2009 111
Upvotes: 3
Reputation: 39595
Try creating a flag to identify the yes occurence and after that filter only the desired values. You would need to group by institution
and year
. Then, compute the length of values with yes greater or equal to one. With that you can flag the no values if there is some value yes inside the group. Finally, filter only the zero values in Flag
and you will drop the rows as you expected. Here the code:
library(dplyr)
#Code
newdf <- ext %>% group_by(institution,year) %>%
mutate(NYes=length(fac[fac=='yes']),
Flag=ifelse(fac=='no' & NYes>=1,1,0)) %>%
filter(Flag==0) %>% select(-c(NYes,Flag))
Output:
# A tibble: 10 x 4
# Groups: institution, year [10]
institution year fac value
<chr> <chr> <chr> <int>
1 a 2005 yes 31
2 a 2006 no 51
3 a 2007 no 14
4 a 2008 no 67
5 a 2009 yes 42
6 b 2005 yes 43
7 b 2006 no 25
8 b 2007 no 90
9 b 2008 no 91
10 b 2009 yes 69
And the full code to summarise by year:
#Code 2
newdf <- ext %>% group_by(institution,year) %>%
mutate(NYes=length(fac[fac=='yes']),
Flag=ifelse(fac=='no' & NYes>=1,1,0)) %>%
filter(Flag==0) %>% select(-c(NYes,Flag)) %>%
ungroup() %>%
group_by(year) %>%
summarise(value=sum(value))
Output:
# A tibble: 5 x 2
year value
<chr> <int>
1 2005 74
2 2006 76
3 2007 104
4 2008 158
5 2009 111
Upvotes: 1
Reputation: 11584
Does this work: by summarise, I assumed you want to sum by year after applying the filtering.
library(dplyr)
ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no'))
# A tibble: 10 x 4
# Groups: institution, year [10]
institution year fac value
<chr> <chr> <chr> <int>
1 a 2005 yes 31
2 a 2006 no 51
3 a 2007 no 14
4 a 2008 no 67
5 a 2009 yes 42
6 b 2005 yes 43
7 b 2006 no 25
8 b 2007 no 90
9 b 2008 no 91
10 b 2009 yes 69
ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no')) %>%
ungroup() %>% group_by(year) %>% summarise(value = sum(value))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 5 x 2
year value
<chr> <int>
1 2005 74
2 2006 76
3 2007 104
4 2008 158
5 2009 111
Upvotes: 2