Reputation: 329
I have a dataframe 'df' where I want to summarize how many times each 'user' has a higher 'total' value for each head-to-head 'game'. My data frame looks like this:
game | user | total |
---|---|---|
1 | L | 55 |
1 | J | 60 |
2 | L | 64 |
2 | J | 77 |
3 | L | 90 |
3 | J | 67 |
4 | L | 98 |
4 | J | 88 |
5 | L | 71 |
5 | J | 92 |
The summary would state that L had a higer total in 2 games and J had a higher total in 3 games.
Thank you!
Upvotes: 1
Views: 94
Reputation: 26695
Is this the type of output you want?
library(tidyverse)
df <- structure(list(game = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
user = c("L", "J", "L", "J", "L", "J", "L", "J", "L", "J"),
total = c(55L, 60L, 64L, 77L, 90L, 67L, 98L, 88L, 71L, 92L)
), class = "data.frame", row.names = c(NA, -10L))
df %>%
group_by(game) %>%
slice_max(order_by = total,
n = 1,
with_ties = TRUE) %>%
group_by(user) %>%
summarise(wins = n())
#> # A tibble: 2 × 2
#> user wins
#> <chr> <int>
#> 1 J 3
#> 2 L 2
Created on 2022-01-20 by the reprex package (v2.0.1)
If you have a draw, then the above method counts that as a 'win' for both users. To count a draw as 'no winner' for both users (e.g. shown in game 1, below), you could use:
library(tidyverse)
df <- structure(list(game = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
user = c("L", "J", "L", "J", "L", "J", "L", "J", "L", "J"),
total = c(55L, 55L, 64L, 77L, 90L, 67L, 98L, 88L, 71L, 92L)
), class = "data.frame", row.names = c(NA, -10L))
df
#> game user total
#> 1 1 L 55
#> 2 1 J 55
#> 3 2 L 64
#> 4 2 J 77
#> 5 3 L 90
#> 6 3 J 67
#> 7 4 L 98
#> 8 4 J 88
#> 9 5 L 71
#> 10 5 J 92
df %>%
group_by(game) %>%
distinct(total, .keep_all = TRUE) %>%
filter(n() >= 2) %>%
slice_max(order_by = total,
n = 1,
with_ties = FALSE) %>%
group_by(user) %>%
summarise(win = n())
#> # A tibble: 2 × 2
#> user win
#> <chr> <int>
#> 1 J 2
#> 2 L 2
Created on 2022-01-20 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 7116
We can group the data by game, slice_max
and then count
the resulting data.
library(tidyverse)
df %>% group_by(game) %>%
slice_max(total) %>%
ungroup() %>%
count(user)
#> # A tibble: 2 × 2
#> user n
#> <chr> <int>
#> 1 J 3
#> 2 L 2
Created on 2022-01-20 by the reprex package (v2.0.1)
Note that if there's a tie, it will add one to both teams:
library(tidyverse)
df <-
read_table('game user total
1 L 60
1 J 60
2 L 64
2 J 77
3 L 90
3 J 67
4 L 98
4 J 88
5 L 71
5 J 92')
df %>% group_by(game) %>%
slice_max(total) %>%
ungroup() %>%
count(user)
#> # A tibble: 2 × 2
#> user n
#> <chr> <int>
#> 1 J 3
#> 2 L 3
Created on 2022-01-20 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 28705
Same approach as Vinay, using data.table
library(data.table)
setDT(df)
df[order(total), tail(.SD, 1), game][, .N, user]
#> user N
#> <char> <int>
#> 1: J 3
#> 2: L 2
Created on 2022-01-19 by the reprex package (v2.0.1)
Data used:
df <- structure(list(game = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L
), user = c("L", "J", "L", "J", "L", "J", "L", "J", "L", "J"),
total = c(55L, 60L, 64L, 77L, 90L, 67L, 98L, 88L, 71L, 92L
)), row.names = c(NA, -10L), class = "data.frame")
Upvotes: 2
Reputation: 253
Assuming df is your dataframe the following should give you the long form summary.
df %>%
arrange(game,desc(total)) %>% #we sort descending to ensure winner row is first.
group_by(game) %>% # we group the rows per game, this allows for winner row to be first in each group
slice_head(n=1)%>% #get first row in each group i.e winner row
ungroup()
Output:
# A tibble: 5 × 3
game user total
<int> <chr> <int>
1 1 J 60
2 2 J 77
3 3 L 90
4 4 L 98
5 5 J 92
If you just want the user wise summary add count to the code as follows:
df %>%
arrange(game,desc(total)) %>% #we sort descending to ensure winner row is first.
group_by(game) %>% # we group the rows per game, this allows for winner row to be first in each group
slice_head(n=1) %>% #get first row in each group i.e winner row
ungroup() %>%
count(user)
Output:
# A tibble: 2 × 2
user n
<chr> <int>
1 J 3
2 L 2
Upvotes: 1