ivan
ivan

Reputation: 93

How does ddply split the data?

I have this data frame.

mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
                  ,c(1,2,3,10,20,30),
                  c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")

I'm trying to get a better understanding on how ddply works.

I'd like to get the average length and speed for each pairing of model and class.

I know this is one way to do it: ddply(mydf, .(Model, Class), .fun = summarize, mSpeed = mean(Speed), mLength = mean(Length)).

I wonder if I can get the mean using ddply and without specifying it one at a time.

I tried ddply(mydf, .(Model, Class), .fun = mean) but I get the error

Warning messages: 1: In mean.default(piece, ...) : argument is not numeric or logical: returning NA

What does ddply pass on to the function argument? Is there a way to apply one function to every column using ddply?

My goal is to learn more about ddply. I will only accept answers will ddply

Upvotes: 0

Views: 62

Answers (1)

Ryan John
Ryan John

Reputation: 1430

Here's a solution using dplyr and the summarize function.



library(dplyr)


mydf<- data.frame(c("a","a","b","b","c","c"),c("e","e","e","e","e","e")
                  ,c(1,2,3,10,20,30),
                  c(5,10,20,20,15,10))
colnames(mydf)<-c("Model", "Class","Length", "Speed")

#summarize data by Model & Class
mydf %>%  group_by(Model, Class) %>% summarize_if(is.numeric, mean)


#> # A tibble: 3 x 4
#> # Groups:   Model [3]
#>   Model Class Length Speed
#>   <fct> <fct>  <dbl> <dbl>
#> 1 a     e        1.5   7.5
#> 2 b     e        6.5  20  
#> 3 c     e       25    12.5

Created on 2019-04-16 by the reprex package (v0.2.1)

Upvotes: 0

Related Questions