Dr.FishGirl
Dr.FishGirl

Reputation: 43

Count combinations of variable levels in R

I wasn't exactly sure how to title this question but here is what I am trying to do. I have a dataframe with a "trip" column, and another with a "species caught" column. I am trying to count the number of trips where each species was captured with my species of interest. For example, say 5 trips caught both my species of interest and species x. I have created a simplified example here:

trip = c(1,1,1,2,2,3,3,3,3,4,5)
color = c("red","orange","green","red","orange","orange","green","blue","purple","red","green")
dat = as.data.frame(cbind(trip,color))
dat

> dat
   trip  color
1     1    red
2     1 orange
3     1  green
4     2    red
5     2 orange
6     3 orange
7     3  green
8     3   blue
9     3 purple
10    4    red
11    5  green

say this is my dataframe, and I want to count the number of trips that contain the color red plus every other color. So I would end up with a dataframe that looks like this:

color2 = c("orange","green","blue","purple")
trips.with.red = c(2,1,0,0)
dat2 = as.data.frame(cbind(color2,trips.with.red))
dat2

> dat2
  color2 trips.with.red
1 orange              2
2  green              1
3   blue              0
4 purple              0

Where for each of the other colors in the dataset, I get a column that shows the number of trips that contained that particular color and red. Any advice on how to do this would be appreciated.

Upvotes: 1

Views: 72

Answers (2)

Karthik S
Karthik S

Reputation: 11546

Does this work:

> dat %>% group_by(trip) %>% mutate(flag = map_dbl(color, ~ if(.x == 'red') 1 else 0)) %>% 
+                                   mutate(flag = max(flag)) %>% 
+                                               filter(color != 'red') %>% ungroup() %>% group_by(color) %>% 
+                                                       summarise(trips_with_red = sum(flag)) %>% arrange(desc(trips_with_red))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
  color  trips_with_red
  <chr>           <dbl>
1 orange              2
2 green               1
3 blue                0
4 purple              0
> 

Upvotes: 0

Ben
Ben

Reputation: 30514

With dplyr, you can add an indicator if any row within a trip group includes color of red. Then, grouping by color you can summarise the total of these trips.

library(dplyr)

dat %>%
  group_by(trip) %>%
  mutate(trip_with_red = any(color == "red")) %>%
  filter(color != "red") %>%
  group_by(color) %>%
  summarise(trips_with_red = sum(trip_with_red))

Output

  color  trips_with_red
  <chr>           <int>
1 blue                0
2 green               1
3 orange              2
4 purple              0

Upvotes: 2

Related Questions