Reputation: 1563
I'm trying to count totals for goals, primary assists, and secondary assists for each player. My problem is that I can't get my head around the logic to do that, as the data I want to summarize by (player name) is listed across three variables (goal, primary assist and secondary assist)
Here's my reproducible data (it's from dput()
, so apologies for the mess).
mydata <- structure(list(primary_assist = c("Dmitry Gilyazitdinov", "Evgeny Orlov",
"Anton Burdasov", "Sergei Kalinin", "Stanislav Solovyov", "Vasily Streltsov",
NA, "Bogdan Potekhin", "Bogdan Potekhin", "Vasily Streltsov",
"Vasily Streltsov", "Viktor Postnikov", "Danil Kaskov", NA, NA,
"Artemy Panarin"), secondary_assist = c("Andrei Badrutdinov",
NA, NA, NA, "Danil Gubarev", "Nikita Manukhov", NA, "Evgeny Grigorenko",
"Daniil Apalkov", "Ivan Boiko", NA, "Viktor Antipin", "Vitaly Sychov",
NA, NA, "Stanislav Levin"), goal = c("Vitaly Kropachyov", "Dmitry Kozlov",
"Stanislav Solovyov", "Kirill Polyansky", "Anton Burdasov", "Ilya Solodov",
"Alexander Antropov", "Daniil Apalkov", "Evgeny Grigorenko",
"Alexander Antropov", "Alexander Antropov", "Evgeny Grigorenko",
"Denis Belonogov", "Vitaly Sychov", "Alexander Streltsov", "Pyotr Kopyttsov"
), team = c("Belye Medvedi", "Omskie Yastreby", "Belye Medvedi",
"Omskie Yastreby", "Belye Medvedi", "Avto", "Avto", "Stalnye Lisy",
"Stalnye Lisy", "Avto", "Avto", "Stalnye Lisy", "Avto", "Avto",
"Avto", "Russkie Vityazi"), game_strength = c("PP", "EV", "EV",
"EV", "EV", "PP", "SO", "EV", "PP", "PP", "EV", "PP", "PP", "EV",
"PP", "EV"), season = c("2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10",
"2009-10", "2009-10", "2009-10", "2009-10", "2009-10", "2009-10"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-16L), .Names = c("primary_assist", "secondary_assist", "goal",
"team", "game_strength", "season"))
mydata
#> # A tibble: 16 x 6
#> primary_assist secondary_assist goal team game_strength season
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Dmitry Gilyazitdinov Andrei Badrutdin~ Vita~ Bely~ PP 2009-~
#> 2 Evgeny Orlov <NA> Dmit~ Omsk~ EV 2009-~
#> 3 Anton Burdasov <NA> Stan~ Bely~ EV 2009-~
#> 4 Sergei Kalinin <NA> Kiri~ Omsk~ EV 2009-~
#> 5 Stanislav Solovyov Danil Gubarev Anto~ Bely~ EV 2009-~
#> 6 Vasily Streltsov Nikita Manukhov Ilya~ Avto PP 2009-~
#> 7 <NA> <NA> Alex~ Avto SO 2009-~
#> 8 Bogdan Potekhin Evgeny Grigorenko Dani~ Stal~ EV 2009-~
#> 9 Bogdan Potekhin Daniil Apalkov Evge~ Stal~ PP 2009-~
#> 10 Vasily Streltsov Ivan Boiko Alex~ Avto PP 2009-~
#> 11 Vasily Streltsov <NA> Alex~ Avto EV 2009-~
#> 12 Viktor Postnikov Viktor Antipin Evge~ Stal~ PP 2009-~
#> 13 Danil Kaskov Vitaly Sychov Deni~ Avto PP 2009-~
#> 14 <NA> <NA> Vita~ Avto EV 2009-~
#> 15 <NA> <NA> Alex~ Avto PP 2009-~
#> 16 Artemy Panarin Stanislav Levin Pyot~ Russ~ EV 2009-~
So, I want to count the number of goals, primary assists, and secondary assists for each player, and then have 1 row for each player. Let's say the name "Artemy Panarin" is listed 1 time in goals, 0 times in primary assists, and 2 times in secondary assists, my output would look like this:
tibble::tibble(name = c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"), team = c("Russkie Vityazi", "Russkie Vityazi", "Avto"), goals = c(1, 1, 0), primary_assists = c(0, 0, 1), secondary_assists = c(2, 0, 0))
#> # A tibble: 3 x 5
#> name team goals primary_assists secondary_assists
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Artemy Panarin Russkie Vityazi 1.00 0 2.00
#> 2 Stanislav Levin Russkie Vityazi 1.00 0 0
#> 3 Danil Kaskov Avto 0 1.00 0
Does that make any sense? Any ideas? Tidyverse solutions preferred. Thanks!
Upvotes: 6
Views: 103
Reputation: 129
You can use gather and spread. First gather the goal and assists columns into a "key", and then group by key and player. You can conver the NA to 0s later
library(tidyverse)
mydata_tidy <- mydata %>%
gather(key = "key", value = "player", primary_assist, secondary_assist, goal) %>%
na.omit()
mydata_tidy %>%
group_by(key, player) %>%
summarize(count = n()) %>%
spread(key, count) %>%
filter(player %in% c("Artemy Panarin", "Stanislav Levin", "Danil Kaskov"))
#> # A tibble: 3 x 4
#> player goal primary_assist secondary_assist
#> <chr> <int> <int> <int>
#> 1 Artemy Panarin NA 1 NA
#> 2 Danil Kaskov NA 1 NA
#> 3 Stanislav Levin NA NA 1
Created on 2018-07-18 by the reprex package (v0.2.0).
Upvotes: 2
Reputation: 887551
We can gather
to 'long' format, grouped by 'name', 'team', and the 'key' column (from gather
), summarise
to get the count and spread
back to 'wide' format
library(tidyverse)
gather(mydata, key, name, primary_assist:goal) %>%
group_by(name, team, key) %>%
summarise(n = n()) %>%
spread(key, n, fill = 0)
# A tibble: 30 x 5
# Groups: name, team [30]
# name team goal primary_assist secondary_assist
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 Alexander Antropov Avto 3 0 0
# 2 Alexander Streltsov Avto 1 0 0
# 3 Andrei Badrutdinov Belye Medvedi 0 0 1
# 4 Anton Burdasov Belye Medvedi 1 1 0
# 5 Artemy Panarin Russkie Vityazi 0 1 0
# 6 Bogdan Potekhin Stalnye Lisy 0 2 0
# 7 Daniil Apalkov Stalnye Lisy 1 0 1
# 8 Danil Gubarev Belye Medvedi 0 0 1
# 9 Danil Kaskov Avto 0 1 0
#10 Denis Belonogov Avto 1 0 0
# ... with 20 more rows
Upvotes: 6
Reputation: 10671
One way to get the result is to reshape your data with gather()/spread()
in addition to your summarize strategy.
library(tidyverse)
scoring_summary <- mydata %>%
select(primary_assist:team) %>%
gather("key", "player", -team) %>%
group_by(player) %>%
count(key) %>%
spread(key, n)
# convert NAs to 0
scoring_summary[is.na(scoring_summary)] <- 0
scoring_summary
# A tibble: 28 x 4
# Groups: player [28]
player goal primary_assist secondary_assist
<chr> <dbl> <dbl> <dbl>
1 Alexander Antropov 3 0 0
2 Alexander Streltsov 1 0 0
3 Andrei Badrutdinov 0 0 1
4 Anton Burdasov 1 1 0
5 Artemy Panarin 0 1 0
6 Bogdan Potekhin 0 2 0
7 Daniil Apalkov 1 0 1
8 Danil Gubarev 0 0 1
9 Danil Kaskov 0 1 0
10 Denis Belonogov 1 0 0
count()
is doing the same thing as your original attempt with summarise(count(goals)
Upvotes: 3