klassic123
klassic123

Reputation: 29

Assign Value based on group with data.table in r

I have the following data set:

Name         Make_Miss       Half        
Player A         1             1                
Player B         1             1                
Player A         0             2                
Player A         0             1                
Player A         1             1                
Player B         0             2      

Where Name is the player's name, Make_Miss is whether or not the player made that shot, and Half is which half the shot took place. I am currently using the following code in order to compute the first half made shots count.

Code:

dt[ , Player_First_Made := .N, by = list(dt$Name == "Player A" & dt$Half == 1 & dt$Make_Miss == 1)]

The output:

Name         Make_Miss       Half        Player_First_Made
Player A         1             1                2
Player B         1             1                4
Player A         0             2                4
Player A         0             1                4
Player A         1             1                2
Player B         0             2                4

What is happening here is that wherever Player A has the input of 0 in the Make_Miss column, then the respective row in the Player_First_Made column gets assigned the value of the count of shots that do not match criteria in the list (i.e. Name != Player A or Half != 1 or Make_Miss != 1); however, my desire is the following:

Name         Make_Miss       Half        Player_First_Made
Player A         1             1                2
Player B         1             1                4
Player A         0             2                2
Player A         0             1                2
Player A         1             1                2
Player B         0             2                4

I want the rows that match Name = Player A to always have the value of however many shots they made in the first half. Is there some sort of syntax for data.table that I can specify this assignment?

Upvotes: 0

Views: 789

Answers (2)

Ben K
Ben K

Reputation: 76

A data.table way to do this would be:

dat[Half == 1, .(Player_First_Made = sum(Make_Miss)), .(Name)
    ][dat, on = c('Name')]

Where the first line counts the number of times (sum(Make_Miss)) each player (.(Name)) made a shot in the first half (Half == 1).

The second line joins the resulting aggregated table from the step above back into the original dataset.

Here's the sample data I used:

dat <-
  data.table(
    Name = c('A', 'B'),
    Make_Miss = round(runif(30, 0, 1)),
    Half = round(runif(30, 1, 2))
  )

Upvotes: 1

EcologyTom
EcologyTom

Reputation: 2510

As @chinsoon12 points out, the data you have provided don't really make sense. However, here is a method using dplyr which I think will give you what you want...

library(dplyr)

# Make some data
DATA <- data.frame(Name = c("Player A", "Player B", "Player C",
 "Player A", "Player A", "Player B"), Make_Miss = c(1,1,0,0,1,0),
 Half = c(1,1,2,1,2,2))

# Use dplyr to calculate the sums of 'Half' for each player
OUT <- DATA %>% group_by(Name) %>% mutate(Player_First_Made = sum(Half))

# Check the output
> OUT
# A tibble: 6 x 4
# Groups:   Name [3]
  Name     Make_Miss  Half Player_First_Made
  <fct>        <dbl> <dbl>             <dbl>
1 Player A      1     1                 4
2 Player B      1     1                 3 
3 Player C      0     2                 2
4 Player A      0     1                 4 
5 Player A      1     2                 4 
6 Player B      0     2                 3 

If this isn't what you are looking for, then please edit your question to make it clearer.

Upvotes: 1

Related Questions