gabboshow
gabboshow

Reputation: 5569

MATLAB table averages

My MATALB talbe looks like:

table_variables =

Meas   Group1    Group2    Subject_id    Age      Sex       Var1   Var2
___    _____     _____     __________    ___    ________    ____   ____

 1     'A'       '0'       1             60     'Male'      48     150
 2     'A'       '0'       1             60     'Male'      47     100
 3     'A'       '0'       1             60     'Male'      48     90
 4     'A'       '0'       1             60     'Male'      38     250
 1     'A'       '3'       2             50     'Male'      50     450
 2     'A'       '3'       2             50     'Male'      43     150      
 3     'A'       '3'       2             50     'Male'      45     100      
 ...
 ...

 1     'B'       '2'       900           66     'Female'    66     170      
 2     'B'       '2'       900           66     'Female'    68     110
 3     'B'       '2'       900           66     'Female'    70     250

For each subject there are multiple measurements for the variables Var1 and Var2. How can I create a new table with the averages for each subject? Please suppose that I have more than 2 variables so the code:

mean_var1 =    varfun(@mean,T,'InputVariables','Var1',...
        'GroupingVariables','Subject_id')
mean_var2 =    varfun(@mean,T,'InputVariables','Var2',...
        'GroupingVariables','Subject_id')

Upvotes: 0

Views: 127

Answers (1)

sco1
sco1

Reputation: 12214

From the documentation for varfun, the 'InputVariables' property can be passed the following:

Variables of A to pass to func, specified as the comma-separated pair consisting of 'InputVariables' and a positive integer, vector of positive integers, variable name, cell array of variable names, or logical vector, or an anonymous function that returns a logical scalar.

So you have a few options. One example, using the logical vector mask:

ID = [1, 1, 1, 2, 2, 2];
var_1 = [1, 2, 3, 4, 5, 6];
var_2 = fliplr(var_1);

t = table(ID.', var_1.', var_2.', 'VariableNames', {'ID', 'var_1', 'var_2'});

varmask = [false true true];
varmeans = varfun(@mean, t, 'InputVariables', varmask, 'GroupingVariables', 'ID');

Which returns the following table:

varmeans = 

         ID    GroupCount    mean_var_1    mean_var_2
         __    __________    __________    __________

    1    1     3             2             5         
    2    2     3             5             2         

This approach assumes that the variables are always at the end of the table. For a more robust approach you would probably want to generate a cell array of the variable names to process.

Using the above example:

nvars = 2;
varnames = sprintfc('var_%u', 1:nvars); %  Caveat, sprintfc is an undocumented function
varmeans = varfun(@mean, t, 'InputVariables', varnames, 'GroupingVariables', 'ID');

Will return the same results. I have used sprintfc here to avoid intermediate steps, but do note that it is an undocumented function so the usual caveats apply.

Upvotes: 3

Related Questions