Reputation: 939
I have one column which contains the group ID of each participant. There are three groups so every number in this column is 1, 2 or 3.
Then I have a second column which contains response scores for each participant. I want to calculate the mean/median response score within each group.
I have managed to do this by looping through every row but I sense this is a slow and suboptimal solution. Could someone please suggest a better way of doing things?
Upvotes: 3
Views: 8000
Reputation: 911
grpstats
is a good function to be used ( documentation here )
This is a list of embedded statistics:
and it accepts as well function handles ( Ex: @mean
, @skeweness
)
>> groups = [1 1 1 2 2 2 3 3 3]';
>> data = [0 0 1 0 1 1 1 1 1]';
>> grpstats(data, groups, {'mean'})
ans =
0.3333
0.6667
1.0000
>> [mea, med] = grpstats(data, groups, {'mean', @median})
mea =
0.3333
0.6667
1.0000
med =
0
1
1
Upvotes: 4
Reputation: 12693
This is a good place to use accumarray
(documentation and blog post):
result = accumarray(groupIDs, data, [], @median);
You can of course give a row or column of a matrix instead of a variable called groupIDs
and another for data
. If you'd prefer the mean instead of the median, use @mean
as the 4th arg.
Note: the documentation notes that you should sort the input parameters if you need to rely on the order of the output. I'll leave that exercise for another day though.
Upvotes: 2
Reputation: 26069
Use logic conditions, for example say your data is in matrix m
as follows: the first col is ID
the second col is the response scores,
mean(m(m(:,1)==1,2))
median(m(m(:,1)==1,2))
will give you the mean and median for 1
in the response score, etc
Upvotes: 1