Metsfan
Metsfan

Reputation: 520

Not getting subtotals when groups in R

Every time a player changes I need subtotals of how many strikouts he had in his career.

I have tried doing it using the code below but was not getting subtotals.

player <- c('acostma01', 'acostma01', 'acostma01', 'adkinjo01', 'aguilri01', 'aguilri01', 'aguilri01', 'aguilri01', 'aguilri01')
        year <- c(2010,2011,2012,2007,1985,1986,1987,1988,1989)
        games <- c(41,44,45,1,21,28,18,11,36)
        strikeouts <- c(42,46,46,0,74,104,77,16,80)
        bb_data <- data.frame(player, year, games, strikeouts, stringsAsFactors = FALSE)

Here is code that did not work.

mets <- select(bb_data, player, year, games, strikeouts) %>% 
group_by(player, year) %>% 
colSums(SO)

Here is the output I would like to get:

player      games strikeouts
acostma01   130   134
adkinjo01   1     0
aguilri01   0     351
Grand Total       485

Here is what I was getting (tail of data):

player    team    year  games strikouts
<chr>     <chr>   <int> <int> <int>
swarzan01 NYN      2018    29    31
syndeno01 NYN      2018    25   155
vargaja01 NYN      2018    20    84
wahlbo01  NYN      2018     7     7
wheelza01 NYN      2018    29   179
zamorda01 NYN      2018    16    16

Upvotes: 0

Views: 47

Answers (2)

ZiGaelle
ZiGaelle

Reputation: 744

If you don't care about the year column begin summed, you can do that:

 library(data.table)
 data = setDT(bb_data)[, c(lapply(.SD, sum), .N), by =player]

.N allows you to count the number of rows by player (number of years).

Then you can order it (with a - to get it decreasing):

data[order(-data$strikeouts)]

You get this result:

1: aguilri01 9935   114        351 5
2: acostma01 6033   130        134 3
3: adkinjo01 2007     1          0 1

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14764

You could do:

library(tidyverse)

bb_data %>% 
  group_by(player) %>% 
  summarise_at(vars(games, strikeouts), sum) %>%
  add_row(player = 'Grand Total', games = NA, strikeouts = sum(.$strikeouts))

This would give you:

# A tibble: 4 x 3
  player      games strikeouts
  <chr>       <dbl>      <dbl>
1 acostma01     130        134
2 adkinjo01       1          0
3 aguilri01     114        351
4 Grand Total    NA        485

Which is consistent with all values except games for aguilri01 - I presume it is a typo, but let me know if this is incorrect.

For descending order, you could do:

bb_data %>% 
  group_by(player) %>% 
  summarise_at(vars(games, strikeouts), sum) %>%
  arrange(-strikeouts) %>%
  add_row(player = 'Grand Total', games = NA, strikeouts = sum(.$strikeouts))

Output:

# A tibble: 4 x 3
  player      games strikeouts
  <chr>       <dbl>      <dbl>
1 aguilri01     114        351
2 acostma01     130        134
3 adkinjo01       1          0
4 Grand Total    NA        485

To also include the seasons played, you can try:

bb_data %>% 
  group_by(player) %>% 
  mutate(seasons_played = n_distinct(year)) %>%
  group_by(player, seasons_played) %>%
  summarise_at(vars(games, strikeouts), sum) %>% 
  arrange(-strikeouts) %>%
  ungroup() %>%
  add_row(player = 'Grand Total', games = NA, seasons_played = NA, strikeouts = sum(.$strikeouts))

Upvotes: 2

Related Questions