Reputation: 1610
Im having some troubles using factors in functions, or just to make use of them in basic calculations. I have a data-frame something like this (but with as many as 6000 different factors).
df<- data.frame( p <- runif(20)*100,
q = sample(1:100,20, replace = T),
tt = c("e","e","f","f","f","i","h","e","i","i","f","f","j","j","h","h","h","e","j","i"),
ta = c("a","a","a","b","b","b","a","a","c","c","a","b","a","a","c","c","b","a","c","b"))
colnames(df)<-c("p","q","ta","tt")
Now price = p and quantity = q are my variables, and tt and ta are different factors.
Now, I would first like to find the average price per unit of q by each different factor in tt
(p*q ) / sum(q) by tt
This would in this case give me a list of 3 different sums, by a, b and c (I have 6000 different factors so I need to do it smart :) ).
I have tried using split to make lists, and in this case i can get each individual tt factor to contain the prices and another for the quantity, but I cant seem to get them to for example make an average. I've also tried to use tapply, but again I can't see how I can incorporate factors into this?
EDIT: I can see I need to clearify:
I need to find 3 sums, the average price pr. q given each factor, so in this simplified case it would be:
a: Sum of p*q for (Row (1,2,3, 7, 11, 13,14,18) / sum (q for row Row (1,2,3, 7, 11, 13,14,18)
So the result should be the average price for a, b and c, which is just 3 values.
Upvotes: 1
Views: 280
Reputation: 647
If I understood corectly you'r problem this should be the answer. Give it a try and responde, that I can adjust it if it's needed.
myRes <- function(tt) {
out <- NULL;
qsum <- sum(as.numeric(df[,"q"]))
psum <- sum(as.numeric(df[,"p"]))
for (var in tt) {
index <- which(df["tt"] == var)
out <- c(out, ((qsum *psum) / sum(df[index,"q"])))
}
return (out)
}
threeValue <- myRes(levels(df[, "tt"]));
Upvotes: 0
Reputation: 60984
I'd use plyr
to do this:
library(plyr)
ddply(df, .(tt), mutate, new_col = (p*q) / sum(q))
p q ta tt new_col
1 73.92499 70 e a 11.29857879
2 58.49011 60 e a 7.66245932
3 17.23246 27 f a 1.01588711
4 64.74637 42 h a 5.93743967
5 55.89372 45 e a 5.49174103
6 25.87318 83 f a 4.68880732
7 12.35469 23 j a 0.62043207
8 1.19060 83 j a 0.21576367
9 84.18467 25 e a 4.59523322
10 73.59459 66 f b 10.07726727
11 26.12099 99 f b 5.36509998
12 25.63809 80 i b 4.25528535
13 54.74334 90 f b 10.22178577
14 69.45430 50 h b 7.20480246
15 52.71006 97 i b 10.60762667
16 17.78591 54 i c 5.16365066
17 0.15036 41 i c 0.03314388
18 85.57796 30 h c 13.80289670
19 54.38938 44 h c 12.86630433
20 44.50439 17 j c 4.06760541
plyr
does have a reputation for being slow, data.table
provides similar functionality, but much higher performance.
Upvotes: 1