Pete900
Pete900

Reputation: 2176

Converting code from ddply to plyr in R

I want to change a bit of my code to use plyr instead of ddply because I think it will be faster on my large (>1e6) data set. Here is an example data set:

ID <- rep(1:3, each=6)
Row <- rep(1, each=18) 
Col <- rep(rep(1:2, each=3), times=3)
Meas <- rnorm(18,3,1)
len <- rep(1:3, times=6)

df <- data.frame(ID, Row, Col, Meas, len)

The code I normally use is this:

res <- ddply(df, c("ID", "Row", "Col"), function(x) coefficients(lm(Meas~len,x)))

It performs a lm for Meas against len for each subset of df by ID, Row and Col, extracting the coefficients. On my large data set it takes 30 seconds (not the end of the world, I know). When I try plyr with this:

res2 <- df %>% group_by("ID", "Row", "Col") %>% (function(x) coefficients(lm(Meas~len,x))) %>%
  as.data.frame()

I only get one intercept and grad. I've read this (extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data) which gave me this attempt:

res3 <- df %>% group_by("ID", "Row", "Col") %>%
  do({model=lm(Meas~len, data=.)
  data.frame(tidy(model),
             glance(model))})

But again no luck. I'm sure I'm missing something simple.

Update:

Out of interest for anyone running a similar thing on large data sets:

system.time(
lres <- ddply(I, c("ERF", "Wafer", "Row", "Col"), function(x) coefficients(lm(Rds.on.fwd~Length,x)))
)

user  system elapsed 
  25.80    0.06   26.02

system.time(
  lres2 <- I %>% group_by(ERF, Wafer, Row, Col) %>% do(
    as.data.frame.list(coef(lm(Rds.on.fwd~Length, data=.))))
  )

user  system elapsed 
  43.12    0.25   44.02 

system.time(
lres3 <- setDT(I)[, as.list(coef(lm(Rds.on.fwd~Length))), .(ERF,Wafer, Row, Col)]
)

user  system elapsed 
  19.77    0.05   19.91

so actually @akrun the data.table option is the best so thank you again.

Upvotes: 2

Views: 95

Answers (1)

akrun
akrun

Reputation: 887991

We modify the OP's last piece of code to get the expected output. We group by the variables 'ID', 'Row' and 'Col', do the lm using the variables 'Meas' and 'len', extract the coefficients with coef, convert it to a list and then to data.frame (as.data.frame.list) to create two new columns ('intercept' and 'slope').

df %>% 
  group_by(ID, Row, Col) %>%
  do(as.data.frame.list(coef(lm(Meas~len, data=.))))

Or using data.table, we convert the 'data.frame' to 'data.table, group by 'ID', 'Row', and 'Col', do the lm, extract the coefficients, and convert to a list so that we get two new columns.

library(data.table)
setDT(df)[, as.list(coef(lm(Meas~len))), .(ID, Row, Col)]

Upvotes: 3

Related Questions