Reputation: 2077
I have a data table of basketball player data which includes the game date for each game and multiple players. I want to create a column that computes the number of days since the previous game. I'm using the data.table package in R.
PLAYERID GAME_DATE
1: 2989 2014-01-1
2: 2989 2014-01-3
I'm using the following code:
DT[, DAY_DIFF:=diff(GAME_DATE, lag=1), by=PLAYERID]
it returns:
PLAYERID GAME_DATE DAY_DIFF
1: 2989 2014-01-1 2
2: 2989 2014-01-3 2
where it appends the number of days until the next game for every row except the last. In the last row, it appends the number of days since the previous game, the value I want. I want to put an NA in the first row because it is the first game.
Upvotes: 2
Views: 1323
Reputation: 369
Maybe a more intuitive and readable solution:
library(data.table)
library(dplyr)
library(magrittr)
# Reproduce the data.table
DT <- fread("PLAYERID GAME_DATE\n2989 2014-01-1\n2989 2014-01-3\n2989 2014-07-10\n2900 2014-01-10")
DT[, GAME_DATE:=as.Date(GAME_DATE)]
# Use dplyr to lag
group_by( DT, PLAYERID ) %>%
mutate( DAY_DIFF = GAME_DATE - lag(GAME_DATE) )
#Source: local data table [4 x 3]
#Groups: PLAYERID
#
# PLAYERID GAME_DATE DAY_DIFF
#1 2900 2014-01-10 NA
#2 2989 2014-01-01 NA
#3 2989 2014-01-03 2
#4 2989 2014-07-10 188
Upvotes: 0
Reputation: 49810
# Reproduce the data.table
DT <- fread("PLAYERID GAME_DATE\n2989 2014-01-1\n2989 2014-01-3")
DT[, GAME_DATE:=as.Date(GAME_DATE)]
# concatenate an NA to the front and use na.pad=FALSE with diff()
DT[, DAY_DIFF:=c(NA, diff(GAME_DATE, lag=1, na.pad=FALSE)), by=PLAYERID]
DT
# PLAYERID GAME_DATE DAY_DIFF
# 1: 2989 2014-01-01 NA
# 2: 2989 2014-01-03 2
Upvotes: 3