aulky11
aulky11

Reputation: 79

Organising large datasets in Matlab

I have a problem I hope you can help me with.

I have imported a large dataset (200000 x 5 cell) in Matlab that has the following structure:

'Year' 'Country' 'X' 'Y' 'Value'

Columns 1 and 5 contain numeric values, while columns 2 to 4 contain strings.

I would like to arrange all this information into a variable that would have the following structure:

NewVariable{Country_1 : Country_n , Year_1 : Year_n}(Y_1 : Y_n , X_1 : X_n)

All I can think of is to loop through the whole dataset to find matches between the names of the Country, Year, X and Y variables combining the if and strcmp functions, but this seems to be the most ineffective way of achieving what I am trying to do.

Can anyone help me out?

Thanks in advance.

Upvotes: 1

Views: 161

Answers (1)

EBH
EBH

Reputation: 10440

As mentioned in the comments you can use categorical array:

% some arbitrary data:
country = repmat('ca',10,1);
country = [country; repmat('cb',10,1)];
country = [country; repmat('cc',10,1)];
T = table(repmat((2001:2005)',6,1),cellstr(country),...
    cellstr(repmat(['x1'; 'x2'; 'x3'],10,1)),...
    cellstr(repmat(['y1'; 'y2'; 'y3'],10,1)),...
    randperm(30)','VariableNames',{'Year','Country','X','Y','Value'});
% convert all non-number data to categorical arrays:
T.Country = categorical(T.Country);
T.X = categorical(T.X);
T.Y = categorical(T.Y);
% here is an example for using categorical array:
newVar = T(T.Country=='cb' & T.Year==2004,:);

The table class is made for such things, and very convenient. Just expand the logic statement in the last line T.Country=='cb' & T.Year==2004 to match your needs. Tell me if this helps ;)

Upvotes: 1

Related Questions