elis56
elis56

Reputation: 121

Group rows of a matrix based on multiple columns

I have a matrix A that contains in the first column user_id, second column year, third column month, forth column day

 A=[0 2010 10 19; 
    0 2010 10 19; 
    0 2010 10 18; 
    0 2010 10 18; 
    0 2010 10 17; 
    0 2010 10 17; 
    0 2010 9 20; 
    0 2010 9 19; 
    0 2010 9 19; 
    0 2010 9 19]

I want divide this data based on month and day. How can I do this in MATLAB?

Upvotes: 2

Views: 1304

Answers (3)

Luis Mendo
Luis Mendo

Reputation: 112659

Here's another approach:

  1. Sort rows of A based on the appropriate columns;
  2. Detect chunk sizes based on a change in any value from those columns;
  3. Split the sorted matrix based on those sizes.

Code:

key = [3 4]; % columns used as key for splitting
As = sortrows(A, [3 4]); % sort based on that (step 1)
chunks = diff([0; find(any(diff(As(:, [3 4]), [], 1), 2)); size(A,1)]); % sizes (step 2)
result = mat2cell(As, chunks, size(A,2)); % split with those sizes (step 3)

With your example data

A = [0 2010 10 19; 
     0 2010 10 19; 
     0 2010 10 18; 
     0 2010 10 18; 
     0 2010 10 17; 
     0 2010 10 17; 
     0 2010 9 20; 
     0 2010 9 19; 
     0 2010 9 19; 
     0 2010 9 19];

this gives

>> celldisp(result)
result{1} =
           0        2010           9          19
           0        2010           9          19
           0        2010           9          19
result{2} =
           0        2010           9          20
result{3} =
           0        2010          10          17
           0        2010          10          17
result{4} =
           0        2010          10          18
           0        2010          10          18
result{5} =
           0        2010          10          19
           0        2010          10          19

Upvotes: 3

Suever
Suever

Reputation: 65430

You can use unique to group each row by the month and day (columns 3 and 4) and then use splitapply to perform an operation on each group of rows that share a month/day. The operation that we're going to perform is to simply put them into a cell array: @(x){x}. The following code can accomplish this.

% Assign each unique month/day combination an ID
[~, ~, inds] = unique(A(:,[3 4]), 'rows');

% Divide up the rows based upon this ID
groups = splitapply(@(x){x}, A, inds);

And the result of celldisp(groups)

groups{1} =
           0        2010           9          19
           0        2010           9          19
           0        2010           9          19
groups{2} =
           0        2010           9          20
groups{3} =
           0        2010          10          17
           0        2010          10          17
groups{4} =
           0        2010          10          18
           0        2010          10          18
groups{5} =
           0        2010          10          19
           0        2010          10          19

That being said, it may be possible for you to do your analysis without actually splitting data into different pieces. This will likely be more performant as MATLAB is optimized for matrix operations.

More info on splitapply and grouping your data here

Upvotes: 4

rayryeng
rayryeng

Reputation: 104474

The function splitapply is a nice function (actually I've never heard of the function until this point) if you have version R2015b. If you don't have that function, then a call to accumarray may be prudent. Going with the same logic in assigning each unique month and day with a unique ID, you can use accumarray to find the row locations that share the same ID, then access the matrix by slicing into it with those row locations and returning a matrix subset per ID in a cell array of elements, much like what splitapply does.

Something like this could work:

% Assign each unique month/day combination an ID
% From Suever
[~,~,inds] = unique(A(:,[3 4]), 'rows');

% Divide up the rows based upon this ID
groups = accumarray(inds, (1:size(A,1)).', [], @(x) {A(x,:)});

We thus get:

>> format compact
>> celldisp(groups)
groups{1} =
           0        2010           9          19
           0        2010           9          19
           0        2010           9          19
groups{2} =
           0        2010           9          20
groups{3} =
           0        2010          10          17
           0        2010          10          17
groups{4} =
           0        2010          10          18
           0        2010          10          18
groups{5} =
           0        2010          10          19
           0        2010          10          19

Upvotes: 3

Related Questions