hpy
hpy

Reputation: 2161

R: Conditional nested grouped summaries with dplyr?

Thanks to @Frank with my previous post (more details there), where I was able to use to answer some questions about a dataset on people's drinking patterns in bars:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE
Quark’s Bar,Bender,Black Hole,1,TRUE
Quark’s Bar,Bender,water,1,FALSE
Quark’s Bar,Bender,unspecified,1,TRUE
Quark’s Bar,Homer,Black Hole,2,FALSE
Quark’s Bar,Guinan,unspecified,2,TRUE
Quark’s Bar,Guinan,Black Hole,1,TRUE
Quark’s Bar,Krusty,Black Hole,1,FALSE
Quark’s Bar,Krusty,water,2,FALSE
Quark’s Bar,Rebecca,unspecified,1,FALSE
Maz’s Tavern,Krusty,water,1,TRUE
Maz’s Tavern,Rebecca,water,1,FALSE
Maz’s Tavern,Homer,water,1,TRUE
Maz’s Tavern,Bender,water,2,FALSE

Specifically, @Frank suggested this code:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

Which produces grouped summaries for whether patrons ordered the same drinks in each bar, how many of them, and if they liked them:

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

For this post, however, I've got an additional issue where some people's drink orders were unspecified (in Quark's Bar), and some people ordered water:

  1. For unspecified, I want it to act as a "generic" drink so it would not be counted as a different drink (if other drinks were ordered at that bar). For example, in Quark's Bar I want to result to be TRUE that everyone ordered the same drink. Of course, if in a bar everyone only ordered unspecified, the result would be TRUE, too.

  2. For water, I generally want it to be ignored (e.g. because it's not an alcoholic drink!), so at first I thought I could simply use dplyr's filter() to remove rows of data where the order was water. The complication is that I want the results to be TRUE when the only thing people ordered was water, such as in Maz's Tavern. So I don't think I can simply remove rows with water, I want them to be considered! In other words, I don't want water to count unless it was the only unique thing ever ordered in that bar_name.

Is there a way to conditionally (is that the right term?) deal with "exceptional" items like water or unspecified? I prefer a dplyr (i.e. Hadley-verse) based solution that produces a table like the one @Frank did with the code above which takes those two items into account, though whatever you can think of will be appreciated. Thank you!

Upvotes: 1

Views: 54

Answers (0)

Related Questions