Reputation: 661
I'm trying to use DPLYR to retrieve and summarize data. I wrote the below and it works, but I would like to combine this all into one statement. Is this possible?
create datasets
set.seed(1)
dbo_games <- data.frame(
name = sample(c("Team1","Team2","Team3","Team4","Team5","Team6","Team7","Team8","Team9","Team10")),
total_games = sample(1:10)
)
set.seed(1)
dbo_wins <- data.frame(
name = sample(c("Team1","Team2","Team3","Team4","Team5","Team6","Team7","Team8","Team9","Team10")),
tota_wins = sample(c("yes", "no"), 10, replace = TRUE)
)
total_games <- con %>% tbl("dbo_games")
total_wins <- con %>% tbl("dbo_wins")
total<- total_games %>% filter(games > 12) %>%
group_by(NAME) %>%
summarise(total_games = n_distinct(game_id)) %>% collect()
wins <- total_wins %>% filter( win == 'Y') %>%
group_by(NAME) %>%
summarise(total_wins = n_distinct(game_id)) %>% collect()
perc_win <- total %>% left_join(wins) %>%
mutate(pct_won = total_wins/total_games)
This code works, but I believe there is likely a more succinct way of writing the code to achieve the same results. Any thoughts?
Upvotes: 0
Views: 581
Reputation: 3183
It would have been easier to address this if you had shared sample data and why you are doing what you are doing.
However, you could still chain them together as below:
total_games %>%
filter(games > 12) %>%
group_by(NAME) %>%
summarise(total_games = n_distinct(game_id)) %>%
left_join(total_wins %>% filter( win == 'Y') %>%
group_by(NAME) %>%
summarise(total_wins = n_distinct(game_id))) %>%
mutate(pct_won = total_wins/total_games)
Upvotes: 1