Reputation: 15708
I frequently have a table with a combination of columns that acts as a grouping key/common identifier, such that the key may be repeated across rows. A simple example:
sampleId = [1 1 1 3 3 3]';
entity = [1 2 3 1 4 5]';
dataTable = table(sampleId, entity)
where here, the observations of entity
can be thought to be attached to sample 1, and 3.
I find it useful to compress this data so that the key is unique amongst the rows, for example, I would like a final table that looks like:
----------------------------
| sampleId | entity |
----------------------------
| 1 | 3x1 table |
| 3 | 3x1 table |
----------------------------
The only way I know of doing this is to use a for loop as follows:
tempCell = cell(length(unique(dataTable.sampleId)), 1);
counter = 1;
nonGroupVariables = dataTable.Properties.VariableNames(...
~ismember(dataTable.Properties.VariableNames,'sampleId'));
for sampleId = unique(dataTable.sampleId)'
tempCell(counter) = {dataTable(dataTable.sampleId == sampleId, nonGroupVariables)};
counter = counter + 1;
end
newDataTable = table(unique(dataTable.sampleId), tempCell, 'VariableNames', ['sampleId', nonGroupVariables]);
Are there any better ways (more efficient/faster) of achieving this, maybe using accummarray
, or grouping?
Upvotes: 2
Views: 106
Reputation: 112699
You can use accumarray
indeed. I'll distinguish two cases:
Of course the second case includes the first, but it's easier to consider the first and then proceed with the second.
sampleId = [1 1 1 3 3 3]';
sampleId2 = [1 1 2 3 2 2]';
entity = [1 2 3 1 4 5]'; %'
dataTable = table(sampleId, sampleId2, entity); %// example data
n = 2; %// number of grouping variables
[u, ~, v] = unique(dataTable{:,1:n}, 'rows');
c = accumarray(v, dataTable{:,n+1}, [], @(x) {x}); %// cell array of vectors,
%// where each vector refers to one value of the grouping variable
ut = mat2cell(u, size(u,1), ones(1,n)); %// convert to cell array
compressedTable = [table(ut{:}, 'VariableNames', dataTable.Properties.VariableNames(1:n)) ...
cell2table(c, 'VariableNames', dataTable.Properties.VariableNames(n+1))];
%// create output table with correct variable names
This produces a table such that
Note that curly-bracket indexing into the table is used to make the code independendent of the table variable names. In the above example, the result is
>> compressedTable
compressedTable =
sampleId sampleId2 entity
________ _________ ____________
1 1 [2x1 double]
1 2 [ 3]
3 2 [2x1 double]
3 3 [ 1]
>> compressedTable.entity{1}
ans =
2
1
>> compressedTable.entity{2}
ans =
3
>> compressedTable.entity{3}
ans =
4
5
>> compressedTable.entity{4}
ans =
1
In this case you probably need to loop over the columns other than the first. In the following I use arrayfun
for the looping.
sampleId = [1 1 1 3 3 3]';
sampleId2 = [1 1 2 3 2 2]';
entity = [1 2 3 1 4 5]'; %'
entity2 = entity*2;
dataTable = table(sampleId, sampleId2, entity, entity2); %// example data
n = 2; %// number of grouping variables
[u, ~, v] = unique(dataTable{:,1:n}, 'rows');
c = arrayfun(@(n) accumarray(v, dataTable{:,n}, [], @(x) {x}), n+1:size(dataTable,2), ...
'uniformoutput', 0); %// cell array of cell arrays of vectors
ut = mat2cell(u, size(u,1), ones(1,n)); %// convert to cell array
compressedTable = [table(ut{:}, 'VariableNames', dataTable.Properties.VariableNames(1:n)) ...
cell2table([c{:}], 'VariableNames', dataTable.Properties.VariableNames(n+1:end))];
%// create output table with correct variable names
The result is
compressedTable =
sampleId sampleId2 entity entity2
________ _________ ____________ ____________
1 1 [2x1 double] [2x1 double]
1 2 [ 3] [ 6]
3 2 [2x1 double] [2x1 double]
3 3 [ 1] [ 2]
>> compressedTable.entity{1}
ans =
2
1
>> compressedTable.entity2{1}
ans =
4
2
>> compressedTable.entity{2}
ans =
3
>> compressedTable.entity2{2}
ans =
6
>> compressedTable.entity{3}
ans =
4
5
>> compressedTable.entity2{3}
ans =
8
10
>> compressedTable.entity{4}
ans =
1
>> compressedTable.entity2{4}
ans =
2
Upvotes: 1
Reputation: 15708
I found another method using varfun
:
compressedTable = varfun(@(x){x}, dataTable, 'GroupingVariables', 'sampleId');
compressedTable.GroupCount = [];
compressedTable.Properties.VariableNames = dataTable.Properties.VariableNames;
Upvotes: 0