Sisse
Sisse

Reputation: 291

On using plyr and ldply

I have a reoccuring problem - I apologize!

Say I want to have the baseball data (from the plyr package) listed according to 'id' and 'year'. There is a difference between creating the list according to either:

1. mylist1 <- dlply(baseball, .(id, year), identity)

and

2. mylist2 <- dlply(baseball, .(id), dlply, .(year), identity)

in the way the list is organized, but getting the list back into a data frame is working fine with 'mylist1'.

mydf1 <- ldply(mylist1)

but not with 'mylist2'

mydf2 <- ldply(mylist2)

which gives the following error message:

Error in list_to_dataframe(res, attr(.data, "split_label")): Result must be all atomic, or all data frames

I am a newbie to R, and this error message doesn't make much sense to me.

I would like to split my own data frame according to method 2, since I need quite a bit of data manipulation. My question is: how can I merge this list into a data frame? Is there an alternative to do.call(rbind, do.call(rbind,...?

I am greatful for any help!

Upvotes: 3

Views: 9401

Answers (1)

Brian Diggs
Brian Diggs

Reputation: 58825

I agree with @Andrie that this is an odd structure. But I assume that you have a particular reason for doing it this way.

Since it took two passes with dlply to create mylist2, it takes two invocations of ldply to put it back together.

mydf2 <- ldply(mylist2, ldply)

This restores baseball (modulo ordering)

> class(mydf2)
[1] "data.frame"
> all(dim(mydf2) == dim(baseball))
[1] TRUE

Upvotes: 5

Related Questions