Reputation: 173
There is an illustration of my example. Sample data:
df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"),
B = c(1, 5, 7, 23, 54, 202))
df
ID A B
1 1 foo 1
2 1 bar 5
3 2 foo 7
4 2 foo 23
5 3 bar 54
6 5 bar 202
What I want to do is to summarize, by ID, and count of the same IDs. Furthermore, I want frequencies of IDs in subgroups based values of B in different numeric ranges (number of observations with B>=0 & B<5, B>=5 & B<10, B>=10 & B<15, B>=15 & B<20 etc for all IDs).
I want this result:
ID count count_0_5 count_5_10 etc
1 1 2 1 1 etc
2 2 2 NA 1 etc
3 3 1 NA NA etc
4 5 1 NA NA etc
I tried this code using package dplyr
:
df %>%
group_by(ID) %>%
summarize(count=n(), count_0_5 = n(B>=0 & B<5))
However, it returns this error:
`Error in n(B>=0 & B<5) :
unused argument (B>=0 & B<5)`
Upvotes: 4
Views: 205
Reputation: 13125
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>%
mutate(B_cut = cut(B, c(0,5,10,15,20,1000), labels = c('count_0_5','count_5_10','count_10_15','count_15_20','count_20_1000')), count=n()) %>%
group_by(ID,B_cut) %>% mutate(n=n()) %>% slice(1) %>% select(-A,-B) %>%
spread(B_cut, n)
#2nd option
left_join(df %>% group_by(ID) %>% summarise(n=n()),
df %>% mutate(B_cut = cut(B, c(0,5,10,15,20,1000), labels = c('count_0_5','count_5_10','count_10_15','count_15_20','count_20_1000'))) %>%
count(ID,B_cut) %>% spread(B_cut,n),
by='ID')
# A tibble: 4 x 5
# Groups: ID [4]
ID count count_0_5 count_5_10 count_20_1000
<dbl> <int> <int> <int> <int>
1 1 2 2 NA NA
2 2 2 NA 1 1
3 3 1 NA NA 1
4 5 1 NA NA 1
Upvotes: 1
Reputation: 359
Perhaps replacing n(B>=0 & B<5)
with sum(B>=0 & B<5)
?
This will sum the number of cases where the two specified conditions are accomplished.
However, you'll get 0's
instead of NA's
. This can be settled by:
ifelse(sum(B>=0 & B<5)>0, sum(B>=0 & B<5), NA)
I'm pretty sure that there may be a better solution (more clearer and efficient), but this should work!
Upvotes: 3