vitsuk
vitsuk

Reputation: 85

consecutively subtracting columns in data.table

Suppose I have the following data.table:

 player_id prestige_score_0 prestige_score_1 prestige_score_2 prestige_score_3 prestige_score_4
   1:    100284     0.0001774623     2.519792e-03     5.870781e-03     7.430179e-03     7.937716e-03
   2:    103819     0.0001774623     1.426482e-03     3.904329e-03     5.526974e-03     6.373850e-03
   3:    100656     0.0001774623     2.142518e-03     4.221423e-03     5.822705e-03     6.533448e-03
   4:    104745     0.0001774623     1.084913e-03     3.061197e-03     4.383649e-03     5.091851e-03
   5:    104925     0.0001774623     1.488457e-03     2.926728e-03     4.360301e-03     5.068171e-03

And I want to find the difference between values in each column starting from column prestige_score_0

In one step it should look like this: df[,prestige_score_0] - df[,prestige_score_1]

How can I do it in data.table(and save this differences as data.table and keep player_id as well)?

Upvotes: 0

Views: 1408

Answers (3)

mluerig
mluerig

Reputation: 749

you can subtract a whole dt with a shifted version of itself

dt = data.table(id=c("A","B"),matrix(rexp(10, rate=.1), ncol=5))
dt_shift = data.table(id=dt[,id], dt[, 2:(ncol(dt)-1)] - dt[,3:ncol(dt)])

Upvotes: 2

ozacha
ozacha

Reputation: 1352

This is how you can do this in a tidy way:

# make it tidy
df2 <- melt(df, 
            id = "player_id", 
            variable.name = "column_name", 
            value.name = "prestige_score")  
# extract numbers from column names
df2[, score_number := as.numeric(gsub("prestige_score_", "", column_name))]
# compute differences by player
df2[, diff := prestige_score - shift(prestige_score, n = 1L, type = "lead"),
    by = player_id]

# if necessary, reshape back to original format
dcast(df2, player_id ~ score_number, value.var = c("prestige_score", "diff"))

Upvotes: 3

phil_t
phil_t

Reputation: 861

You could use a for loop -

for(i in c(1:(ncol(df)-1)){
    df[, paste0("diff_", i-1, "_", i)] = df[, paste0("prestige_score_", i-1)] - 
                                              df[, paste0("prestige_score_", i)]
}

This might not be the most efficient if you have a lot of columns though.

Upvotes: 0

Related Questions