Reputation: 395
I have a dataset as follows:
i,o,c
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
I want to reform this dataset into a form as follows:
i,u,o,c
A,3,4,2
B,1,3,1
C,2,2.5,1
Here, u represents the unique instances of variable i in the dataset, o = (sum of o / u) and c = unique countries.
I can get u with the following statement and by using plyr:
count(df1,vars="i")
I can also get some of the other variables by using the insights learned from my previous question. I can laboriously and by saving to multiple data frames and then finally joining them together achieve my intended results by I wonder if there is a one line optimization or just simply a better way of doing this than my current long winded way.
Thanks !
Upvotes: 1
Views: 74
Reputation: 193687
I don't understand how this is different from your earlier question. The approach is the same:
library(plyr)
ddply(mydf, .(i), summarise,
u = length(i),
o = mean(o),
c = length(unique(c)))
# i u o c
# 1 A 3 4.0 2
# 2 B 1 3.0 1
# 3 C 2 2.5 2
If you prefer a data.table
solution:
> library(data.table)
> DT <- data.table(mydf)
> DT[, list(u = .N, o = mean(o), c = length(unique(c))), by = "i"]
i u o c
1: A 3 4.0 2
2: B 1 3.0 1
3: C 2 2.5 2
Upvotes: 4