Reputation: 3893
I am trying to cast data using cast()
from the Reshape library, but I am getting unexpected results. I start with a dataframe that has lots of data in it, and all_ia[all_ia$Student.ID == 102050,]
returns
66 102050 1 Mar
67 102050 0 Dec
68 102050 1 May
69 102050 0 Feb
Where the variables are Student.ID, Proficiency.Level, and testmonth respectively.
There are some Student.IDs with a 5th month, Sep.
When I run all_ia.cast <- cast(all_ia, Student.ID ~ testmonth, value=c("Proficiency.Level"), fill=c("NA"))
and then run all_ia.cast[all_ia.cast$Student.ID == 102050,]
, I get unexpected results:
1325 102050 1 1 1 1 NA
where the variables are Student.ID, Dec, Feb, Mar, May, Sep respectively. There is a warning when I run cast()
which says Aggregation requires fun.aggregate: length used as default
.
My question is, why is the fun.aggregate required and why are the Dec and Feb variables in the cast equal to 1 and not 0?
Thank you for your help!
Upvotes: 1
Views: 118
Reputation: 179558
It's because your casting formula Student.Id ~ tesmonth
does not contain all of the variables in your data.frame, i.e. Proficiency.Level
is not included.
This means, in general, that the casting has to perform an aggregation, and the aggregation formula defaults to length
.
You seem to have a special case, where there is a one-to-one relationship between month and proficiency level for each student. Therefore you should choose a aggregation function that preserves the data, e.g. taking the mean
The following should work:
cast(all_ia, Student.ID ~ testmonth, value=mean("Proficiency.Level"))
You don't supply test data, so this isn't tested.
Upvotes: 1