Nick Knauer
Nick Knauer

Reputation: 4243

Add Group By in For-Loop

I have a dataset as follows:

# Define Adstock Rate
adstock_rate = 0.50

# Create Data
advertising = c(117.913, 120.112, 125.828, 115.354, 177.090, 141.647, 137.892,   0.000,   0.000,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000,   0.000,   0.000, 158.511, 109.385,  91.084,  79.253, 102.706, 
            78.494, 135.114, 114.549,  87.337, 107.829, 125.020,  82.956,  60.813,  83.149,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000, 129.515, 105.486, 111.494, 107.099,   0.000,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000,   0.000,   0.000,   0.000,   0.000,
            134.913, 123.112, 178.828, 112.354, 100.090, 167.647, 177.892,   0.000,   0.000,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000,   0.000,   0.000, 112.511, 155.385,  123.084,  89.253, 67.706, 
            23.494, 122.114, 112.549,  65.337, 134.829, 123.020,  81.956,  23.813,  65.149,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000, 145.515, 154.486, 121.494, 117.099,   0.000,   0.000,   0.000, 
            0.000,   0.000,   0.000,   0.000,   0.000,   0.000,   0.000,   0.000
            )

Region = c(500, 500, 500, 500, 500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,
       500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500, 500, 500, 
       500, 500,
       501, 501, 501, 501, 501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,
       501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501, 501, 501, 
       501, 501)

advertising_dataset<-data.frame(cbind(Region, advertising))

This is what the dataset looks like:

   Region advertising
1     500     117.913
2     500     120.112
3     500     125.828
4     500     115.354
5     500     177.090
6     500     141.647
7     500     137.892
8     500       0.000
9     500       0.000
10    500       0.000
11    500       0.000
12    500       0.000
13    500       0.000
14    500       0.000
15    500       0.000
16    500       0.000
17    500       0.000
18    500     158.511
19    500     109.385
20    500      91.084

From here, I am applying a lag function in which I take the first value, and then apply a for-loop to transform my dataset.

# Alternative Method Using Loops Proposed by Linh Tran
advertising_dataset$adstocked_advertising = numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] = advertising_dataset$advertising[1]

for(i in 2:length(advertising_dataset$advertising)){
  advertising_dataset$adstocked_advertising[i] = advertising_dataset$advertising[i] + adstock_rate * advertising_dataset$adstocked_advertising[i-1]}

The issue I am having is that my dataset is separated by Region. I need to apply this function above (including taking the first value) by region.

Is there a way to do this with the dplyr package?

I know this is wrong but maybe something like this:

library(dplyr)
separated_by_region<- advertising_dataset %>%
group_by(Region) %>%
summarise(
advertising_dataset$adstocked_advertising = 
numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] = 
advertising_dataset$advertising[1]

for(i in 2:length(advertising_dataset$advertising)){
  advertising_dataset$adstocked_advertising[i] = 
advertising_dataset$advertising[i] + adstock_rate * 
advertising_dataset$adstocked_advertising[i-1]})

Something along these lines. Not really sure how to do this.

I have a feeling I may have to use the split(advertising_dataset, advertising_dataset$Region) and use an apply function and rbind the results.

Any help would be great, thanks!

Example Output (but function needs to be applied by Region) 1 final dataset in the end:

  Region     advertising     adstocked_advertising
     500         117.913               117.9130000
     500         120.112               179.0685000
     500         125.828               215.3622500
     500         115.354               223.0351250
     500         177.090               288.6075625
     500         141.647               285.9507812
     500         137.892               280.8673906
     500           0.000               140.4336953
     500           0.000                70.2168477
     500           0.000                35.1084238
     500           0.000                17.5542119
     500           0.000                 8.7771060
     500           0.000                 4.3885530
     500           0.000                 2.1942765
     500           0.000                 1.0971382
     500           0.000                 0.5485691
     500           0.000                 0.2742846
     500         158.511               158.6481423
     500         109.385               188.7090711
     500          91.084               185.4385356

Upvotes: 0

Views: 302

Answers (1)

Luke C
Luke C

Reputation: 10336

I don't think this is really what you meant by using dplyr, or that this is better than a do.call(rbind, lapply(...)) approach, but you could define a function like you've described above:

foo <- function(df_) {
  df_$adstocked_advertising = df_$advertising
  for (i in 2:nrow(df_)) {
    df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - 1]
  }
  return(df_)
}

Then, use your piping to group_by region to apply the function to each group:

library(dplyr)

adv_2 <- data.frame(advertising_dataset %>%
  group_by(Region) %>%
  do(foo(data.frame(.))))


> adv_2[1:10,]
   Region advertising adstocked_advertising
1     500     117.913             117.91300
2     500     120.112             179.06850
3     500     125.828             215.36225
4     500     115.354             223.03512
5     500     177.090             288.60756
6     500     141.647             285.95078
7     500     137.892             280.86739
8     500       0.000             140.43370
9     500       0.000              70.21685
10    500       0.000              35.10842

> adv_2[50:60,]
   Region advertising adstocked_advertising
50    500       0.000              0.401496
51    500       0.000              0.200748
52    500       0.000              0.100374
53    501     134.913            134.913000
54    501     123.112            190.568500
55    501     178.828            274.112250
56    501     112.354            249.410125
57    501     100.090            224.795063
58    501     167.647            280.044531
59    501     177.892            317.914266
60    501       0.000            158.957133

Definitely would need a number check though, it does seem to match your output for the 500 group at least.

Edit:

As per comments, a version where the lag value is adjustable.

foo <- function(df_, lag_val = 1) {
  df_$adstocked_advertising = df_$advertising
  for (i in (1 + lag_val):nrow(df_)) {
    df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - lag_val]
  }
  return(df_)
}

Default lag is still 1, but now you can change lag_val if you want to skip that many rows of the 'adstocked' column.

adv_2 <- data.frame(advertising_dataset %>%
  group_by(Region) %>%
  do(foo(data.frame(.), lag_val = 3)))

> adv_2
    Region advertising adstocked_advertising
1      500     117.913            117.913000
2      500     120.112            120.112000
3      500     125.828            125.828000
4      500     115.354            174.310500
5      500     177.090            237.146000
6      500     141.647            204.561000
7      500     137.892            225.047250
8      500       0.000            118.573000
9      500       0.000            102.280500
10     500       0.000            112.523625

I think that does what you want, but again definitely worth confirming. Hopefully it will help with the answer in your other linked question, but I'm guessing it will need some modification to be more flexible.

Cheers,

-Luke

Upvotes: 1

Related Questions