Reputation: 4243
I have a dataset as follows:
# Define Adstock Rate
adstock_rate = 0.50
# Create Data
advertising = c(117.913, 120.112, 125.828, 115.354, 177.090, 141.647, 137.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 158.511, 109.385, 91.084, 79.253, 102.706,
78.494, 135.114, 114.549, 87.337, 107.829, 125.020, 82.956, 60.813, 83.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 129.515, 105.486, 111.494, 107.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
134.913, 123.112, 178.828, 112.354, 100.090, 167.647, 177.892, 0.000, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 112.511, 155.385, 123.084, 89.253, 67.706,
23.494, 122.114, 112.549, 65.337, 134.829, 123.020, 81.956, 23.813, 65.149, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 145.515, 154.486, 121.494, 117.099, 0.000, 0.000, 0.000,
0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000
)
Region = c(500, 500, 500, 500, 500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,
500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500,500, 500, 500, 500, 500, 500,
500, 500,
501, 501, 501, 501, 501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,
501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501,501, 501, 501, 501, 501, 501,
501, 501)
advertising_dataset<-data.frame(cbind(Region, advertising))
This is what the dataset looks like:
Region advertising
1 500 117.913
2 500 120.112
3 500 125.828
4 500 115.354
5 500 177.090
6 500 141.647
7 500 137.892
8 500 0.000
9 500 0.000
10 500 0.000
11 500 0.000
12 500 0.000
13 500 0.000
14 500 0.000
15 500 0.000
16 500 0.000
17 500 0.000
18 500 158.511
19 500 109.385
20 500 91.084
From here, I am applying a lag function in which I take the first value, and then apply a for-loop to transform my dataset.
# Alternative Method Using Loops Proposed by Linh Tran
advertising_dataset$adstocked_advertising = numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] = advertising_dataset$advertising[1]
for(i in 2:length(advertising_dataset$advertising)){
advertising_dataset$adstocked_advertising[i] = advertising_dataset$advertising[i] + adstock_rate * advertising_dataset$adstocked_advertising[i-1]}
The issue I am having is that my dataset is separated by Region. I need to apply this function above (including taking the first value) by region.
Is there a way to do this with the dplyr package?
I know this is wrong but maybe something like this:
library(dplyr)
separated_by_region<- advertising_dataset %>%
group_by(Region) %>%
summarise(
advertising_dataset$adstocked_advertising =
numeric(length(advertising_dataset$advertising))
advertising_dataset$adstocked_advertising[1] =
advertising_dataset$advertising[1]
for(i in 2:length(advertising_dataset$advertising)){
advertising_dataset$adstocked_advertising[i] =
advertising_dataset$advertising[i] + adstock_rate *
advertising_dataset$adstocked_advertising[i-1]})
Something along these lines. Not really sure how to do this.
I have a feeling I may have to use the split(advertising_dataset, advertising_dataset$Region) and use an apply function and rbind the results.
Any help would be great, thanks!
Example Output (but function needs to be applied by Region) 1 final dataset in the end:
Region advertising adstocked_advertising
500 117.913 117.9130000
500 120.112 179.0685000
500 125.828 215.3622500
500 115.354 223.0351250
500 177.090 288.6075625
500 141.647 285.9507812
500 137.892 280.8673906
500 0.000 140.4336953
500 0.000 70.2168477
500 0.000 35.1084238
500 0.000 17.5542119
500 0.000 8.7771060
500 0.000 4.3885530
500 0.000 2.1942765
500 0.000 1.0971382
500 0.000 0.5485691
500 0.000 0.2742846
500 158.511 158.6481423
500 109.385 188.7090711
500 91.084 185.4385356
Upvotes: 0
Views: 302
Reputation: 10336
I don't think this is really what you meant by using dplyr
, or that this is better than a do.call(rbind, lapply(...))
approach, but you could define a function like you've described above:
foo <- function(df_) {
df_$adstocked_advertising = df_$advertising
for (i in 2:nrow(df_)) {
df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - 1]
}
return(df_)
}
Then, use your piping to group_by
region to apply the function to each group:
library(dplyr)
adv_2 <- data.frame(advertising_dataset %>%
group_by(Region) %>%
do(foo(data.frame(.))))
> adv_2[1:10,]
Region advertising adstocked_advertising
1 500 117.913 117.91300
2 500 120.112 179.06850
3 500 125.828 215.36225
4 500 115.354 223.03512
5 500 177.090 288.60756
6 500 141.647 285.95078
7 500 137.892 280.86739
8 500 0.000 140.43370
9 500 0.000 70.21685
10 500 0.000 35.10842
> adv_2[50:60,]
Region advertising adstocked_advertising
50 500 0.000 0.401496
51 500 0.000 0.200748
52 500 0.000 0.100374
53 501 134.913 134.913000
54 501 123.112 190.568500
55 501 178.828 274.112250
56 501 112.354 249.410125
57 501 100.090 224.795063
58 501 167.647 280.044531
59 501 177.892 317.914266
60 501 0.000 158.957133
Definitely would need a number check though, it does seem to match your output for the 500 group at least.
Edit:
As per comments, a version where the lag value is adjustable.
foo <- function(df_, lag_val = 1) {
df_$adstocked_advertising = df_$advertising
for (i in (1 + lag_val):nrow(df_)) {
df_$adstocked_advertising[i] = df_$advertising[i] + adstock_rate * df_$adstocked_advertising[i - lag_val]
}
return(df_)
}
Default lag is still 1, but now you can change lag_val
if you want to skip that many rows of the 'adstocked' column.
adv_2 <- data.frame(advertising_dataset %>%
group_by(Region) %>%
do(foo(data.frame(.), lag_val = 3)))
> adv_2
Region advertising adstocked_advertising
1 500 117.913 117.913000
2 500 120.112 120.112000
3 500 125.828 125.828000
4 500 115.354 174.310500
5 500 177.090 237.146000
6 500 141.647 204.561000
7 500 137.892 225.047250
8 500 0.000 118.573000
9 500 0.000 102.280500
10 500 0.000 112.523625
I think that does what you want, but again definitely worth confirming. Hopefully it will help with the answer in your other linked question, but I'm guessing it will need some modification to be more flexible.
Cheers,
-Luke
Upvotes: 1