Multiple plyr functions and operations in one statement?

Question

I have a dataset as follows:

i,o,c
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN

I want to reform this dataset into a form as follows:

i,u,o,c
A,3,4,2
B,1,3,1
C,2,2.5,1

Here, u represents the unique instances of variable i in the dataset, o = (sum of o / u) and c = unique countries.

I can get u with the following statement and by using plyr:

count(df1,vars="i")

I can also get some of the other variables by using the insights learned from my previous question. I can laboriously and by saving to multiple data frames and then finally joining them together achieve my intended results by I wonder if there is a one line optimization or just simply a better way of doing this than my current long winded way.

Thanks !

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

I don't understand how this is different from your earlier question. The approach is the same:

library(plyr)
ddply(mydf, .(i), summarise, 
      u = length(i), 
      o = mean(o),
      c = length(unique(c)))
#   i u   o c
# 1 A 3 4.0 2
# 2 B 1 3.0 1
# 3 C 2 2.5 2

If you prefer a data.table solution:

> library(data.table)
> DT <- data.table(mydf)
> DT[, list(u = .N, o = mean(o), c = length(unique(c))), by = "i"]
   i u   o c
1: A 3 4.0 2
2: B 1 3.0 1
3: C 2 2.5 2

Multiple plyr functions and operations in one statement?

Answers (1)

Related Questions