Small Chimp
Small Chimp

Reputation: 63

Grouping by individual contents of a column with grepl/dplyr

In the first sheet, I have a sample of my full data frame. This has already been filtered by the query:

harden <- rockets %>% filter(grepl("Harden", rockets_lineup))

In the second sheet, when I group by "game_id" and then summarize the "play_length" column, I get those results after dividing by 60. Perfect, I essentially have the total time that James Harden is in the "rockets_lineup" column. That's a solid start, but I would like to go further. Ideally, I want to get the play_length sums for the other players in the "rockets_lineup" column, as well as other sums. So the step before that would be grouping by game_id and then grouping by the grepl I suppose? How would I go about this? Sheet 3 has the intended outcome.

(numbers are based on whole data frame, if yours don't match, it's fine)

workbook

Upvotes: 0

Views: 135

Answers (1)

Johan Rosa
Johan Rosa

Reputation: 3152

I think I have a solution to your question. Using separate_rows function from dplyr.

What you have to do is saparate into rows the variable away_lineup where each new observation is a copy of the original, allowing you to group by away_lineup and game_id to get the summarise you already did to James Harden

Try this code.

librar(tidyverse)

data %>%
  separate_rows(away_lineup, sep = ',') %>%
  mutate(away_lineup = str_trim(away_lineup),
         play_length = parse_number(play_length)) %>% 
  group_by(game_id, away_lineup) %>%
  summarise(minute = sum(play_length)/60)

data

data <- structure(list(away_lineup = c("James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams, PJ Tucker", 
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams", 
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams", 
"Clint Capela, James Harden, Carmelo Anthony, Eric Gordon, Michael Carter-Williams", 
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony", 
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony", 
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony", 
"James Ennis III, Clint Capela, Eric Gordon, James Harden, Carmelo Anthony", 
"Clint Capela, Eric Gordon, James Harden, Carmelo Anthony, Gerald Green", 
"Clint Capela, Eric Gordon, James Harden, Carmelo Anthony, Gerald Green", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"James Harden, Eric Gordon, Gary Clark, Isaiah Hartenstein, Chris Paul", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"PJ Tucker, Clint Capela, James Harden, Austin Rivers, Gerald Green", 
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker", 
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker", 
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker", 
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker", 
"Danuel House Jr., Austin Rivers, Gerald Green, James Harden, PJ Tucker", 
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III", 
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III", 
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III", 
"PJ Tucker, James Harden, Gerald Green, Danuel House Jr., James Ennis III"
), play_length = c("0S", "13S", "9S", "4S", "19S", "6S", "8S", 
"0S", "0S", "18S", "11S", "9S", "13S", "12S", "3S", "7S", "5S", 
"7S", "24S", "9S", "19S", "7S", "19S", "16S", "17S", "2S", "15S", 
"3S", "2S", "1S", "9S", "2S", "2S"), game_id = c(21800009, 21800009, 
21800009, 21800009, 21800039, 21800039, 21800039, 21800039, 21800039, 
21800039, 21800180, 21800180, 21800180, 21800180, 21800180, 21800180, 
21800180, 21800512, 21800512, 21800512, 21800512, 21800512, 21800512, 
21800512, 21800565, 21800565, 21800565, 21800565, 21800565, 21800610, 
21800610, 21800610, 21800610)), row.names = c(NA, -33L), class = c("tbl_df", 
"tbl", "data.frame"))

Please, let me know if it is what you want.

Upvotes: 0

Related Questions