user2716568
user2716568

Reputation: 1946

Create new column from row below, based on variable

I have data on the positions of many athletes over the duration of a match. Each quarter of the match goes for a maximum of 30 minutes. An example of my data is:

> df
      StartValue Athlete Quarter Position
    1        0.00    Paul      Q1    Bench
    2        5.35    Paul      Q1 Defender
    3       19.26    Paul      Q1    Bench
    4       23.32    Paul      Q1 Defender
    5        0.00    Paul      Q2    Bench
    6        9.08    Paul      Q2 Defender
    7       13.11    Paul      Q2 Defender
    8        0.00    Paul      Q3 Defender
    9        7.36    Paul      Q3 Defender
    10       2.51    Paul      Q3    Bench
    11       6.44    Paul      Q4    Bench
    12      22.47    Paul      Q4    Bench
    13       0.00    Paul      Q4 Defender
    14      24.38    Paul      Q4 Defender
    15      11.36    Paul      Q4 Defender

I now wish to create a new column df$EndValue that takes the StartValue of the row below and places it in the same column. When the last entry for a quarter occurs, a 30 must be placed in df$EndValue. For example, the first few rows would be:

        > df
           StartValue Athlete Quarter Position EndValue
        1        0.00    Paul      Q1    Bench 5.35
        2        5.35    Paul      Q1 Defender 19.26
        3       19.26    Paul      Q1    Bench 23.32
        4       23.32    Paul      Q1 Defender 30.00
        5        0.00    Paul      Q2    Bench 9.08

My anticipated output for the data.frame would be:

Output <- data.frame(StartValue=c(0, 5.35, 19.26, 23.32, 
                                 0.00, 9.08, 13.11, 0, 
                                 2.51, 7.36, 0.0, 6.44,
                                 11.36, 22.47, 24.38), 
                     EndValue=c(5.35, 19.26, 23.32, 30,
                                9.08, 13.11, 30, 2.51,
                                7.36, 30, 6.44, 11.36,
                                22.47, 24.38, 30),
                    Athlete = c('Paul', 'Paul', 'Paul', 'Paul',
                                'Paul', 'Paul', 'Paul','Paul',
                                'Paul', 'Paul', 'Paul','Paul',
                                'Paul', 'Paul', 'Paul'),
                    Quarter = c('Q1', 'Q1', 'Q1', 'Q1', 
                                'Q2', 'Q2', 'Q2', 'Q3', 
                                'Q3', 'Q3', 'Q4', 'Q4', 
                                'Q4', 'Q4', 'Q4'),
                    Position = c('Bench','Defender','Bench','Defender',
                                 'Bench','Defender','Defender','Defender',
                                 'Defender','Bench','Bench','Bench', 
                                 'Defender', 'Defender', 'Defender'))

I have data on many athletes for these 30 minute quarters, so how can I quickly add this new column?

Thank you.

Upvotes: 1

Views: 296

Answers (2)

Nick Kennedy
Nick Kennedy

Reputation: 12640

Here's a solution using dplyr:

library(dplyr)
quarter_lengths <- c(Q1 = 31, Q2 = 32, Q3 = 30, Q4 = 33)
df %>%
  group_by(Athlete, Quarter) %>%
  mutate(EndValue = c(StartValue[-1], quarter_lengths[Quarter[1]]))

If it gets more complicated, e.g. multiple games with different quarter lengths, I would create a new data.frame with the quarter lengths and inner_join them.

Upvotes: 1

Sathish
Sathish

Reputation: 12713

setDT converts dataframe to data table. Group by Quarter and assign the last value with 30 and make EndValue column.

library('data.table')

EDIT:

In your comment, you asked for changing endvalue for every quarter with unique values. First assign StartValue to EndValue, and then find the row index of the last value in each quarter. In the next step, update the EndValue with 31 for Q1, 32 for Q2, 33 for Q3 and 34 for Q4.

I created two players - Paul and Bob. They both have same data except their names.

# sample data
setDT( df )  # convert data frame to data table by reference
df1 <- copy(df)  # replicate data by copying df
df[, Athlete := 'Bob']  # asssign Athlete with Bob player
df <- rbindlist(l = list( df1,  df) )  # combine df1 and df

# sort StartValue by player and quarter
df <- df[order(StartValue), .SD, by = .( Athlete, Quarter ) ]

# assign start to endvalue and with unique number per player per quarter
df[, EndValue := StartValue ]  # Assign StartValue to EndValue

# remove 1st, shift values up and assign NA to last
df[, EndValue := c( EndValue[-1], NA ), by = .(Athlete, Quarter )]  

df[ i = df[, .I[.N], by = .(Quarter, Athlete)][, V1], 
    j = EndValue := rep( c(31,32,33,34), 
                         length( df[, unique(Athlete) ] ) ) ]

df
#    Athlete Quarter StartValue Position EndValue
# 1:    Paul      Q1       0.00    Bench     5.35
# 2:    Paul      Q1       5.35 Defender    19.26
# 3:    Paul      Q1      19.26    Bench    23.32
# 4:    Paul      Q1      23.32 Defender    31.00
# 5:    Paul      Q2       0.00    Bench     9.08
# 6:    Paul      Q2       9.08 Defender    13.11
# 7:    Paul      Q2      13.11 Defender    32.00
# 8:    Paul      Q3       0.00 Defender     2.51
# 9:    Paul      Q3       2.51    Bench     7.36
# 10:    Paul      Q3       7.36 Defender    33.00
# 11:    Paul      Q4       0.00 Defender     6.44
# 12:    Paul      Q4       6.44    Bench    11.36
# 13:    Paul      Q4      11.36 Defender    22.47
# 14:    Paul      Q4      22.47    Bench    24.38
# 15:    Paul      Q4      24.38 Defender    34.00
# 16:     Bob      Q1       0.00    Bench     5.35
# 17:     Bob      Q1       5.35 Defender    19.26
# 18:     Bob      Q1      19.26    Bench    23.32
# 19:     Bob      Q1      23.32 Defender    31.00
# 20:     Bob      Q2       0.00    Bench     9.08
# 21:     Bob      Q2       9.08 Defender    13.11
# 22:     Bob      Q2      13.11 Defender    32.00
# 23:     Bob      Q3       0.00 Defender     2.51
# 24:     Bob      Q3       2.51    Bench     7.36
# 25:     Bob      Q3       7.36 Defender    33.00
# 26:     Bob      Q4       0.00 Defender     6.44
# 27:     Bob      Q4       6.44    Bench    11.36
# 28:     Bob      Q4      11.36 Defender    22.47
# 29:     Bob      Q4      22.47    Bench    24.38
# 30:     Bob      Q4      24.38 Defender    34.00
#     Athlete Quarter StartValue Position EndValue

Upvotes: 2

Related Questions