Carl
Carl

Reputation: 111

Old code for ddply does not work

I have some code from the book Longitudinal Data Analysis for the Behavioral Sciences using R (2012) that do not work.

This is what the data looks like:

    subid risk gen eth ell sped  att ell2 risk2 grade read
      1  HHM   F Afr   0    N 0.94   No  DADV     5  172
      1  HHM   F Afr   0    N 0.94   No  DADV     6  185
      1  HHM   F Afr   0    N 0.94   No  DADV     7  179
      1  HHM   F Afr   0    N 0.94   No  DADV     8  194
      2  HHM   F Afr   0    N 0.91   No  DADV     5  200
      2  HHM   F Afr   0    N 0.91   No  DADV     6  210

The code looks like this:

ddply(.data = data.frame(MPLS.LS$read), .variables = .(grade = MPLS.LS$grade),
      each(read.mean = mean), na.rm = FALSE)

It is suppose to give me the mean of read over grade 5, 6, 7, and 8. But instead I get this error message:

Warning messages:
1: In mean.default(x, ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(x, ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(x, ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(x, ...) :
  argument is not numeric or logical: returning NA 

My question is why do I get this message? Can I change something in the code to get the result i whant?

Any help would be much appreciated becuse there is a lot of this code in the book that does not work for me.

Upvotes: 1

Views: 913

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

Changing my comment to an answer:

ddply can take your full data frame as a data argument, then you don't need to re-specify the data:

ddply(.data = MPLS.LS, .variables = .(grade), summarize, read.mean = mean(read, na.rm = FALSE))

each() is (was) typically used when you want to call each of several functions on one column. Since you have one function, you're better off with summarize.

ddply has been more-or-less replaced by the dplyr package. I would recommend learning these packages from their current documentation rather than from possibly out-of-date textbooks. dplyr has quite a few vignettes that do a nice job introducing the functionality. The dplyr equivalent for this operation is

library(dplyr)
group_by(MPLS.LS, grade) %>%
  summarize(read.mean = mean(read, na.rm = FALSE))

dplyr is current and fashionable - I like it a lot - but nothing lasts forever.

Upvotes: 1

Related Questions