sjhuk
sjhuk

Reputation: 323

Converting Columns to Rows with two summary values in R

I have a data.frame like this:

P   Stat    V   Points
1   Goals   2   10
1   Assists 1   3
2   Goals   1   5
2   Assists 1   3

and I would like to convert it to something like this:

P   Goals   Assists Points
1   2       1       13
2   1       1       8

Currently I am using dcast, as follows:

dcast(stats, P ~ Stat, value.var = "V"), which works just without the "Points". When I add Points in, it starts duplicating the new columns names with _1, _2 etc.

Any help appreciated. This is not a school project, I'm just a curious consultant trying to refresh my statistics skill-set on problem that interests me!

Upvotes: 1

Views: 53

Answers (2)

Uwe
Uwe

Reputation: 42544

There are (at least) two possibilities to achieve the desired result.

Cast with multiple value.vars

The recent versions of data.table allow to specify multiple value.var as parameter to dcast():

library(data.table)   # version 1.10.4 used
dcast(DT, P ~ Stat, value.var = list("V", "Points"))
#   P V_Assists V_Goals Points_Assists Points_Goals
#1: 1         1       2              3           10
#2: 2         1       1              3            5

If only one Points column is desired, the points need to be added and the unnecessary columns to be deleted. By chaining, this can be done in one statement but isn't very concise.

dcast(DT, P ~ Stat, value.var = list("V", "Points"))[
  , Points := Points_Assists + Points_Goals][
    , c("Points_Assists", "Points_Goals") := NULL][]
#   P V_Assists V_Goals Points
#1: 1         1       2     13
#2: 2         1       1      8

Cast and join

Alternatively, the dcast of V and the aggregation of points can be done in seperate steps and the results being joined afterwards:

# dcast
temp1 <- dcast(DT, P ~ Stat, value.var = "V")
temp1
#   P Assists Goals
#1: 1       1     2
#2: 2       1     1

# sum points by P
temp2 <- DT[, .(Points = sum(Points)), by = P]
temp2
#   P Points
#1: 1     13
#2: 2      8

Now the two results need to be joined:

temp1[temp2, on = "P"]
#   P Assists Goals Points
#1: 1       1     2     13
#2: 2       1     1      8

or combined in one statement:

dcast(DT, P ~ Stat, value.var = "V")[DT[, .(Points = sum(Points)), by = P], on = "P"]

The code looks more straightforward and concise than the first variant.

Data

library(data.table)
DT <- fread(
  "P   Stat    V   Points
  1   Goals   2   10
  1   Assists 1   3
  2   Goals   1   5
  2   Assists 1   3")

Note that fread() returns a data.table object by default. In case DT is still a data.frame it needs to be coerced to data.table by

setDT(DT) 

Upvotes: 1

akrun
akrun

Reputation: 887048

We can do a dcast and then add the 'Points' column

library(data.table)
dcast(setDT(d1), P~Stat, value.var = "V")[, Points := sum(d1$Points)][]
#     P Assists Goals Points
#1: 1       1     2     13

Upvotes: 1

Related Questions