William88
William88

Reputation: 39

It is possible to take a previous row value grouping by two different columns?

DT:

Hteam            Ateam          Season      HT_Points   AT_Points
Grodig          Salzburg        2015/2016       23          29
Rapid Vienna    Altach          2015/2016       38          15
Ried            Austria Vienna  2015/2016       32          30
Sturm Graz      Mattersburg     2015/2016       30          17
Admira          Rapid Vienna    2015/2016       24          27
Altach          Ried            2015/2016       25          10
Austria Vienna  Sturm Graz      2015/2016       29          18
Mattersburg     Grodig          2015/2016       22          12
Salzburg        AC Wolfsberger  2015/2016       45          11
Rapid Vienna    Ried            2016/2017        3           0
Altach          AC Wolfsberger  2016/2017        3           0
Sturm Graz      Salzburg        2016/2017        3           0
St. Polten      Austria Vienna  2016/2017        0           3
Mattersburg     Admira          2016/2017        0           3
Salzburg        AC Wolfsberger  2016/2017        1           1
Ried            Sturm Graz      2016/2017        3           0
Altach          Rapid Vienna    2016/2017        6           0
Austria Vienna  Mattersburg     2016/2017        3           0

Desired Output:

Hteam            Ateam          Season      HT_Points   AT_Points HT_PointsTOTAL    AT_PointsTOTAL
Grodig          Salzburg        2015/2016       23          29           23 + ?         29 + ?
Rapid Vienna    Altach          2015/2016       38          15           38 + ?         15 + ? 
Ried            Austria Vienna  2015/2016       32          30           32 + ?         30 + ?
Sturm Graz      Mattersburg     2015/2016       30          17           30 + ?         17 + ?
Admira          Rapid Vienna    2015/2016       24          27           24 + ?            65
Altach          Ried            2015/2016       25          10             40              42
Austria Vienna  Sturm Graz      2015/2016       29          18             59              48
Mattersburg     Grodig          2015/2016       22          12             39              35
Salzburg        AC Wolfsberger  2015/2016       45          11             74           11 + ?
Rapid Vienna    Ried            2016/2017        3           0             NA              NA
Altach          AC Wolfsberger  2016/2017        3           0             NA              NA
Sturm Graz      Salzburg        2016/2017        3           0             NA              NA
St. Polten      Austria Vienna  2016/2017        0           3             NA              NA
Mattersburg     Admira          2016/2017        0           3             NA              NA
Salzburg        AC Wolfsberger  2016/2017        1           1              1              NA
Ried            Sturm Graz      2016/2017        3           0              3               3
Altach          Rapid Vienna    2016/2017        6           0             NA               3
Austria Vienna  Mattersburg     2016/2017        3           0              6               0

HT_PointsTOTAL = HT_Points + AT_Points(last game played as Ateam by Hteam)
AT_PointsTOTAL = AT_Points + HT_Points(last game played as Hteam by Ateam)

Note: ? --> It should be a number. 
            It has been put like this since the rows it refers to are not shown.
     NA --> No previous game on that Season by Hteam as Ateam or by Ateam as Hteam.

I know what to look for a value from a previous row you can use shift. But in this case I do not know how to do it since the name of the team is the same but in different columns(Hteam & Ateam).

Maybe with shift you can not do what I want to do. The goal is to add the total points of a team. That is, when playing at home, you have to look for the points from the last time that the team played as a visitor and add them(and vice versa).

Maybe the only solution is to use a function to create the new columns. But I don't know how to do it.

It is necessary to use the Season column to group by.

If it is possible to use data.table package.

Upvotes: 2

Views: 102

Answers (2)

krads
krads

Reputation: 1369

UPDATED
Very similar to @chinsoon12 with data.table but avoiding using .SD so IMO more concise:

library(data.table)
setDT(DT)

DT[, rn := .I]

# Join to get away points (i.AT_Points) for Hteam
DT[DT, 
   HT_PointsTOTAL := HT_Points + i.AT_Points,
   on = .(Hteam=Ateam, Season=Season, rn>rn)]  # note rn>rn (using greater than here)

# Join to get home points (i.HT_Points) for Ateam
DT[DT, 
   AT_PointsTOTAL := AT_Points + i.HT_Points,
   on = .(Ateam=Hteam, Season=Season, rn<rn)]  # note nr<rn (using less than here)

DT

Resulting in (for your updated sample data):

             Hteam          Ateam    Season HT_Points AT_Points rn HT_PointsTOTAL AT_PointsTOTAL
 1:         Grodig       Salzburg 2015/2016        23        29  1             NA             NA
 2:   Rapid Vienna         Altach 2015/2016        38        15  2             NA             NA
 3:           Ried Austria Vienna 2015/2016        32        30  3             NA             NA
 4:     Sturm Graz    Mattersburg 2015/2016        30        17  4             NA             NA
 5:         Admira   Rapid Vienna 2015/2016        24        27  5             NA             65
 6:         Altach           Ried 2015/2016        25        10  6             40             42
 7: Austria Vienna     Sturm Graz 2015/2016        29        18  7             59             48
 8:    Mattersburg         Grodig 2015/2016        22        12  8             39             35
 9:       Salzburg AC Wolfsberger 2015/2016        45        11  9             74             NA
10:   Rapid Vienna           Ried 2016/2017         3         0 10             NA             NA
11:         Altach AC Wolfsberger 2016/2017         3         0 11             NA             NA
12:     Sturm Graz       Salzburg 2016/2017         3         0 12             NA             NA
13:     St. Polten Austria Vienna 2016/2017         0         3 13             NA             NA
14:    Mattersburg         Admira 2016/2017         0         3 14             NA             NA
15:       Salzburg AC Wolfsberger 2016/2017         1         1 15              1             NA
16:           Ried     Sturm Graz 2016/2017         3         0 16              3              3
17:         Altach   Rapid Vienna 2016/2017         6         0 17             NA              3
18: Austria Vienna    Mattersburg 2016/2017         3         0 18              6              0

If you like you can remove the rn column when you're done:

DT$rn <- NULL

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

Here is an approach using data.table non-equi join using row number to ensure that we only pick from previous rows:

library(data.table)
setDT(DT)
DT[, rn := .I]

#calculate home team points first
DT[, HT_PointsTotal := 
    .SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season"="Season", "Ateam"="Hteam", "rn<rn")]]

#then calculate away team points
DT[, AT_PointsTotal := 
    .SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season"="Season", "Hteam"="Ateam", "rn<rn")]]

Adding a roll approach when dataset gets larger and there is Cartesian join error due to Hteam appearing multiple times in Ateam column.

dummy[, rn := .I]
dummy[, HT_PointsTotal :=
        .SD[.SD, .(x.AT_Points + i.HT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
    ]

dummy[, AT_PointsTotal :=
        .SD[.SD, .(x.HT_Points + i.AT_Points), on=c("Season", "Ateam"="Hteam", "rn"), roll=Inf]
    ]

dummy data (this was time consuming to create and does not reflect reality as well):

library(data.table)
DT <- fread("Hteam,Ateam,Season,HT_Points,AT_Points
Grodig,Salzburg,2015/2016,23,29
Rapid Vienna,Altach,2015/2016,38,15
Ried,Austria Vienna,2015/2016,32,30
Sturm Graz,Mattersburg,2015/2016,30,17
Admira,Rapid Vienna,2015/2016,24,27
Altach,Ried,2015/2016,25,10
Austria Vienna,Sturm Graz,2015/2016,29,18
Mattersburg,Grodig,2015/2016,22,12
Salzburg,AC Wolfsberger,2015/2016,45,11")

numTeams <- DT[,uniqueN(c(Hteam, Ateam))]

firstHalf <- lapply(seq_len(DT[,.N]),
    function(n) data.table(
        Matchday=n*2L-1L,
        Hteam=DT[["Hteam"]],
        Ateam=c(DT[["Ateam"]][-seq_len(n)], DT[["Ateam"]][seq_len(n)]),
        Season=DT[["Season"]],
        HT_Points=DT[["HT_Points"]],
        AT_Points=DT[["AT_Points"]]
    ))

secondHalf <- lapply(seq_len(DT[,.N]),
    function(n) data.table(
        Matchday=n*2L,
        Hteam=DT[["Ateam"]],
        Ateam=c(DT[["Hteam"]][-seq_len(n)], DT[["Hteam"]][seq_len(n)]),
        Season=DT[["Season"]],
        HT_Points=DT[["HT_Points"]],
        AT_Points=DT[["AT_Points"]]
    ))


dummy <- rbindlist(c(firstHalf, secondHalf))[
    Hteam!=Ateam][,
        .SD[1L], by=.(Hteam, Ateam)]
setorder(dummy, Matchday, Hteam)

Upvotes: 1

Related Questions