Reputation: 1946
I have data on the positions of many athletes over the duration of a match. Each quarter of the match goes for a maximum of 30 minutes. An example of my data is:
> df
StartValue Athlete Quarter Position
1 0.00 Paul Q1 Bench
2 5.35 Paul Q1 Defender
3 19.26 Paul Q1 Bench
4 23.32 Paul Q1 Defender
5 0.00 Paul Q2 Bench
6 9.08 Paul Q2 Defender
7 13.11 Paul Q2 Defender
8 0.00 Paul Q3 Defender
9 7.36 Paul Q3 Defender
10 2.51 Paul Q3 Bench
11 6.44 Paul Q4 Bench
12 22.47 Paul Q4 Bench
13 0.00 Paul Q4 Defender
14 24.38 Paul Q4 Defender
15 11.36 Paul Q4 Defender
I now wish to create a new column df$EndValue
that takes the StartValue
of the row below and places it in the same column. When the last entry for a quarter occurs, a 30 must be placed in df$EndValue
. For example, the first few rows would be:
> df
StartValue Athlete Quarter Position EndValue
1 0.00 Paul Q1 Bench 5.35
2 5.35 Paul Q1 Defender 19.26
3 19.26 Paul Q1 Bench 23.32
4 23.32 Paul Q1 Defender 30.00
5 0.00 Paul Q2 Bench 9.08
My anticipated output for the data.frame would be:
Output <- data.frame(StartValue=c(0, 5.35, 19.26, 23.32,
0.00, 9.08, 13.11, 0,
2.51, 7.36, 0.0, 6.44,
11.36, 22.47, 24.38),
EndValue=c(5.35, 19.26, 23.32, 30,
9.08, 13.11, 30, 2.51,
7.36, 30, 6.44, 11.36,
22.47, 24.38, 30),
Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
'Paul', 'Paul', 'Paul','Paul',
'Paul', 'Paul', 'Paul','Paul',
'Paul', 'Paul', 'Paul'),
Quarter = c('Q1', 'Q1', 'Q1', 'Q1',
'Q2', 'Q2', 'Q2', 'Q3',
'Q3', 'Q3', 'Q4', 'Q4',
'Q4', 'Q4', 'Q4'),
Position = c('Bench','Defender','Bench','Defender',
'Bench','Defender','Defender','Defender',
'Defender','Bench','Bench','Bench',
'Defender', 'Defender', 'Defender'))
I have data on many athletes for these 30 minute quarters, so how can I quickly add this new column?
Thank you.
Upvotes: 1
Views: 296
Reputation: 12640
Here's a solution using dplyr
:
library(dplyr)
quarter_lengths <- c(Q1 = 31, Q2 = 32, Q3 = 30, Q4 = 33)
df %>%
group_by(Athlete, Quarter) %>%
mutate(EndValue = c(StartValue[-1], quarter_lengths[Quarter[1]]))
If it gets more complicated, e.g. multiple games with different quarter lengths, I would create a new data.frame
with the quarter lengths and inner_join
them.
Upvotes: 1
Reputation: 12713
setDT
converts dataframe to data table. Group by Quarter
and assign the last value with 30 and make EndValue
column.
library('data.table')
EDIT:
In your comment, you asked for changing endvalue for every quarter with unique values. First assign StartValue
to EndValue
, and then find the row index of the last value in each quarter. In the next step, update the EndValue
with 31 for Q1, 32 for Q2, 33 for Q3 and 34 for Q4.
I created two players - Paul and Bob. They both have same data except their names.
# sample data
setDT( df ) # convert data frame to data table by reference
df1 <- copy(df) # replicate data by copying df
df[, Athlete := 'Bob'] # asssign Athlete with Bob player
df <- rbindlist(l = list( df1, df) ) # combine df1 and df
# sort StartValue by player and quarter
df <- df[order(StartValue), .SD, by = .( Athlete, Quarter ) ]
# assign start to endvalue and with unique number per player per quarter
df[, EndValue := StartValue ] # Assign StartValue to EndValue
# remove 1st, shift values up and assign NA to last
df[, EndValue := c( EndValue[-1], NA ), by = .(Athlete, Quarter )]
df[ i = df[, .I[.N], by = .(Quarter, Athlete)][, V1],
j = EndValue := rep( c(31,32,33,34),
length( df[, unique(Athlete) ] ) ) ]
df
# Athlete Quarter StartValue Position EndValue
# 1: Paul Q1 0.00 Bench 5.35
# 2: Paul Q1 5.35 Defender 19.26
# 3: Paul Q1 19.26 Bench 23.32
# 4: Paul Q1 23.32 Defender 31.00
# 5: Paul Q2 0.00 Bench 9.08
# 6: Paul Q2 9.08 Defender 13.11
# 7: Paul Q2 13.11 Defender 32.00
# 8: Paul Q3 0.00 Defender 2.51
# 9: Paul Q3 2.51 Bench 7.36
# 10: Paul Q3 7.36 Defender 33.00
# 11: Paul Q4 0.00 Defender 6.44
# 12: Paul Q4 6.44 Bench 11.36
# 13: Paul Q4 11.36 Defender 22.47
# 14: Paul Q4 22.47 Bench 24.38
# 15: Paul Q4 24.38 Defender 34.00
# 16: Bob Q1 0.00 Bench 5.35
# 17: Bob Q1 5.35 Defender 19.26
# 18: Bob Q1 19.26 Bench 23.32
# 19: Bob Q1 23.32 Defender 31.00
# 20: Bob Q2 0.00 Bench 9.08
# 21: Bob Q2 9.08 Defender 13.11
# 22: Bob Q2 13.11 Defender 32.00
# 23: Bob Q3 0.00 Defender 2.51
# 24: Bob Q3 2.51 Bench 7.36
# 25: Bob Q3 7.36 Defender 33.00
# 26: Bob Q4 0.00 Defender 6.44
# 27: Bob Q4 6.44 Bench 11.36
# 28: Bob Q4 11.36 Defender 22.47
# 29: Bob Q4 22.47 Bench 24.38
# 30: Bob Q4 24.38 Defender 34.00
# Athlete Quarter StartValue Position EndValue
Upvotes: 2