Reputation: 13
I have a data table (temp3) which is like (original table has around 1 million rows) -
creative_code reqcount hasbought numclick FeedbackCPM bidvalue_CPMf browser
79 5 1 0 19 9 C
1 0 0 0 39 50 C
79 3 1 0 1205 684 C
1 7 1 5 82 159 C
1 9 0 3 15 77 C
79 5 0 0 1575 700 C
1 0 0 0 95 300 C
1 4 1 4 95 300 C
1 3 0 0 1 300 C
1 8 0 0 30 65 C
1 9 1 0 17 293 C
1 4 0 1 140 300 IE
79 4 0 0 838 271 F
79 7 1 2 0 13 C
1 9 2 0 67 160 C
79 2 0 0 268 176 F
79 0 1 23 1634 700 C
79 1 0 0 0 300 C
79 5 0 0 143 87 C
79 7 2 0 0 9 IE
1 3 0 0 178 300 IE
1 7 0 0 111 200 F
What I require is mean for all Creative_code with reqcount, hasbought,hasclick separately. I am able to find mean for Creative_code+reqcount separately by using the command - aggregate(bidvalue_CPMf~creative_code+reqcount,data=temp3,FUN=mean)
However, if I use the following code, I get an error -
Code -
for (j in names(temp3)) aggregate(bidvalue_CPMf~creative_code+j,data=temp3,FUN=mean)
Error - Error in model.frame.default(formula = bidvalue_CPMf ~ creative_code + : variable lengths differ (found for 'j')
Please help.
Upvotes: 1
Views: 4342
Reputation: 446
What you require is as.formula
df <- read.table("clipboard", header = T)
Columns <- names(df)[!names(df) %in% c("bidvalue_CPMf", "creative_code")]
for (j in Columns){
fo <- as.formula(paste("bidvalue_CPMf~creative_code+",j))
print(aggregate(fo,data=df,FUN=mean))
}
If you require the analysis only with reqcount, hasbought,hasclick
. Use
Columns <- c("reqcount", "hasbought", "hasclick")
Upvotes: 3
Reputation: 887098
You can try
nm1 <- names(temp3)[2:4]
lapply(nm1, function(x) {
aggregate(temp3['bidvalue_CPMf'], by = c(temp3['creative_code'], temp3[x]),
FUN=mean)
})
[[1]]
creative_code reqcount bidvalue_CPMf
1 1 0 175.0000
2 79 0 700.0000
3 79 1 300.0000
4 79 2 176.0000
5 1 3 300.0000
6 79 3 684.0000
7 1 4 300.0000
8 79 4 271.0000
9 79 5 265.3333
10 1 7 179.5000
11 79 7 11.0000
12 1 8 65.0000
13 1 9 176.6667
[[2]]
creative_code hasbought bidvalue_CPMf
1 1 0 199.0000
2 79 0 306.8000
3 1 1 250.6667
4 79 1 351.5000
5 1 2 160.0000
6 79 2 9.0000
[[3]]
creative_code numclick bidvalue_CPMf
1 1 0 208.5
2 79 0 279.5
3 1 1 300.0
4 79 2 13.0
5 1 3 77.0
6 1 4 300.0
7 1 5 159.0
8 79 23 700.0
Checking the results with the individual approach
aggregate(bidvalue_CPMf~creative_code+hasbought, temp3, FUN=mean)
creative_code hasbought bidvalue_CPMf
1 1 0 199.0000
2 79 0 306.8000
3 1 1 250.6667
4 79 1 351.5000
5 1 2 160.0000
6 79 2 9.0000
If the dataset is large, you may use dplyr
or data.table
library(dplyr)
lapply(nm1, function(x){
temp3 %>%
group_by_('creative_code',.dots=x) %>%
summarise(bidvalue_CPMf=mean(bidvalue_CPMf))})
using data.table
library(data.table)
setDT(temp3)
lapply(nm1, function(x) temp3[, .(bidvalue_CPMf=mean(bidvalue_CPMf)) ,
c('creative_code', x)])
Upvotes: 0