Reputation: 43
I wasn't exactly sure how to title this question but here is what I am trying to do. I have a dataframe with a "trip" column, and another with a "species caught" column. I am trying to count the number of trips where each species was captured with my species of interest. For example, say 5 trips caught both my species of interest and species x. I have created a simplified example here:
trip = c(1,1,1,2,2,3,3,3,3,4,5)
color = c("red","orange","green","red","orange","orange","green","blue","purple","red","green")
dat = as.data.frame(cbind(trip,color))
dat
> dat
trip color
1 1 red
2 1 orange
3 1 green
4 2 red
5 2 orange
6 3 orange
7 3 green
8 3 blue
9 3 purple
10 4 red
11 5 green
say this is my dataframe, and I want to count the number of trips that contain the color red plus every other color. So I would end up with a dataframe that looks like this:
color2 = c("orange","green","blue","purple")
trips.with.red = c(2,1,0,0)
dat2 = as.data.frame(cbind(color2,trips.with.red))
dat2
> dat2
color2 trips.with.red
1 orange 2
2 green 1
3 blue 0
4 purple 0
Where for each of the other colors in the dataset, I get a column that shows the number of trips that contained that particular color and red. Any advice on how to do this would be appreciated.
Upvotes: 1
Views: 72
Reputation: 11546
Does this work:
> dat %>% group_by(trip) %>% mutate(flag = map_dbl(color, ~ if(.x == 'red') 1 else 0)) %>%
+ mutate(flag = max(flag)) %>%
+ filter(color != 'red') %>% ungroup() %>% group_by(color) %>%
+ summarise(trips_with_red = sum(flag)) %>% arrange(desc(trips_with_red))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
color trips_with_red
<chr> <dbl>
1 orange 2
2 green 1
3 blue 0
4 purple 0
>
Upvotes: 0
Reputation: 30514
With dplyr
, you can add an indicator if any
row within a trip
group includes color
of red. Then, grouping by color
you can summarise
the total of these trips.
library(dplyr)
dat %>%
group_by(trip) %>%
mutate(trip_with_red = any(color == "red")) %>%
filter(color != "red") %>%
group_by(color) %>%
summarise(trips_with_red = sum(trip_with_red))
Output
color trips_with_red
<chr> <int>
1 blue 0
2 green 1
3 orange 2
4 purple 0
Upvotes: 2