Reputation: 63
In the first sheet, I have a sample of my full data frame. This has already been filtered by the query:
harden <- rockets %>% filter(grepl("Harden", rockets_lineup))
In the second sheet, when I group by "game_id" and then summarize the "play_length" column, I get those results after dividing by 60. Perfect, I essentially have the total time that James Harden is in the "rockets_lineup" column. That's a solid start, but I would like to go further. Ideally, I want to get the play_length sums for the other players in the "rockets_lineup" column, as well as other sums. So the step before that would be grouping by game_id and then grouping by the grepl I suppose? How would I go about this? Sheet 3 has the intended outcome.
(numbers are based on whole data frame, if yours don't match, it's fine)
Upvotes: 0
Views: 135
Reputation: 3152
I think I have a solution to your question. Using separate_rows
function from dplyr.
What you have to do is saparate into rows the variable away_lineup
where each new observation is a copy of the original, allowing you to group by away_lineup
and game_id
to get the summarise you already did to James Harden
Try this code.
librar(tidyverse)
data %>%
separate_rows(away_lineup, sep = ',') %>%
mutate(away_lineup = str_trim(away_lineup),
play_length = parse_number(play_length)) %>%
group_by(game_id, away_lineup) %>%
summarise(minute = sum(play_length)/60)
data
data <- structure(list(away_lineup = c("James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams, PJ Tucker",
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams",
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams",
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams",
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony",
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony",
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony",
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony",
"Clint Capela, Eric Gordon, James Harden, Carmelo Anthony, Gerald Green",
"Clint Capela, Eric Gordon, James Harden, Carmelo Anthony, Gerald Green",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green",
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker",
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker",
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker",
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker",
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker",
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III",
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III",
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III",
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III"
), play_length = c("0S", "13S", "9S", "4S", "19S", "6S", "8S",
"0S", "0S", "18S", "11S", "9S", "13S", "12S", "3S", "7S", "5S",
"7S", "24S", "9S", "19S", "7S", "19S", "16S", "17S", "2S", "15S",
"3S", "2S", "1S", "9S", "2S", "2S"), game_id = c(21800009, 21800009,
21800009, 21800009, 21800039, 21800039, 21800039, 21800039, 21800039,
21800039, 21800180, 21800180, 21800180, 21800180, 21800180, 21800180,
21800180, 21800512, 21800512, 21800512, 21800512, 21800512, 21800512,
21800512, 21800565, 21800565, 21800565, 21800565, 21800565, 21800610,
21800610, 21800610, 21800610)), row.names = c(NA, -33L), class = c("tbl_df",
"tbl", "data.frame"))
Please, let me know if it is what you want.
Upvotes: 0