Suedocode
Suedocode

Reputation: 2534

Fill a map with parfor

I am parsing a data file in Matlab to plot it. The data I'm parsing has 3 columns:

category | x | y

The file may have several points for any category. In order to plot them together and get the legend right, I need to sort the data such that each category results in a n x 2 matrix, which represents n points in a given category.

In the following excerpt, don't worry about the parsing, but instead the important part is how the map is appended. category is the key, and newVal is a single x-y pair value that needs to be concatenated to that category's matrix of points.

disp('Sorting categories...')
map = containers.Map();
for i = 2:size(data,2)
    str = strsplit(data{i}, '\t');
    category = strsplit(str{1}, '.');
    category = category{1};
    newVal=sscanf([str{2} ',' str{3}],'%f,%f')';

    %Interesting stuff starts here
    if(isKey(map, category))
        mapVal = [map(category); newVal];
    else
        mapVal = newVal;
    end

    map = [map; containers.Map(category, mapVal)];
end

This takes way too long with 59k points, and I would really like to change the for into parfor. My process requires a read-modify-write on the map variable, which isn't going to work. I'd like the code to do something like this instead (assuming it would be faster, which may not be true anyway):

disp('Sorting categories...')
map = containers.Map();
for i = 2:size(data,2)
    str = strsplit(data{i}, '\t');
    category = strsplit(str{1}, '.');
    category = category{1};
    newVal=sscanf([str{2} ',' str{3}],'%f,%f')';

    maps{i} = containers.Map(category, mapVal);

end
map = [maps{:}];

However, concatenating maps in Matlab causes values to be overwritten rather than appended. This will result in each category only holding the very last point that was parsed for that category. Is there a way to avoid this behavior?

Upvotes: 0

Views: 148

Answers (3)

Humberto77
Humberto77

Reputation: 53

Unfortunately I cannot comment on the accepted answer, as I would of rather inserted this there. I found this question to be a great learning experience since I've never used the maps.containers or accumarray function. I was intrigued by accumarray and attempted to apply it to one of my own problems but I found it to be much slower. In that case I had a vector with 5e6 elements and I was grouping 8e5 categories and taking the L2 norm of the values. I then decided to apply a similar approach to the problem here. I found not using accumarray to be a hair faster. Here is what I did,

tic;
xyByCategory = cell(numel(allCats),1);
for kk = 1:numel(allCats)
    xyByCategory{kk} = xyData(strcmp(categoryData,allCats{kk}),:);
end
toc

using N = 6e6

accumarray approach in the answer,
Elapsed time is 0.611510 seconds.

The above approach,
Elapsed time is 0.429321 seconds.

Upvotes: 0

Edric
Edric

Reputation: 25160

Although I love to promote PARFOR and PARFEVAL, I really think this is a job for ACCUMARRAY. Here's something pretty close to your problem. In the first chunk, I synthesize some data - in your real case, I suggest you aim to read all the data from a file in one (perhaps using TEXTSCAN). Anyway, once you've done that, the idea is to convert the categories into a numeric form, and then call accumarray to collect together results in the same category. accumarray is something of a tricky beast to master, so perhaps someone more skilled in the art can figure out a better way than the two calls I'm making - but on my machine, the accumarray piece runs in 0.02 seconds...

% Generate some data rather than reading from a file
N = 59000;
allCats = {'foo', 'bar', 'baz'};
categoryData = allCats(randi([1,numel(allCats)], N, 1));
xyData = rand(N, 2);

% work out which category each row is in (in your case, you could
% first generate allCats using "unique(categoryData)"
[allCats, ~, whichCategory] = unique(categoryData);

% Use accumarray to build up one cell array for each category, once
% each for the x and y data
tic
xByCategory = accumarray(whichCategory, xyData(:,1), [], @(x){x});
yByCategory = accumarray(whichCategory, xyData(:,2), [], @(x){x});
toc

Upvotes: 1

Humberto77
Humberto77

Reputation: 53

I'm not sure how much time is too long. This script took me to 52s to complete using 6 workers. 22sec for the parfor and 29sec for the merge loop. Its not elegant, but it does apply a parfor loop. You can further speed this up using parfeval and the fetchnext functions. fetchnext will allow you to merge your containers as they become available, thereby performing the merge while workers still build your map containers. I'm guessing this would translate into roughly 30-ish seconds vs the 52s.

Note, parfeval and fetchnext are only available in 2013b and above.

% Create some test data
N = 59000;
aa(:,1) = 10*rand(N,1);
% aa(:,1) = [1 2 2 4 5 6 6 8 9 10];
aa(:,2) = rand(N,1);
aa(:,3) = rand(N,1);
data = strtrim(cellstr(num2str(aa,'%g,%g,%g'))');


ndata = size(data,2);
p = gcp;
if isempty(p)
    NumWorkers = 1;
else
    NumWorkers = p.NumWorkers;
end

tic;
% work out the indices for each worker
numchunks = ceil(ndata/NumWorkers);
for kk = 1:numchunks
    strt = (kk-1)*NumWorkers+1;
    endl = kk*NumWorkers;
    if endl > ndata
        endl = ndata;
    end
    ind{kk} = strt:endl;
end

maps = cell(0);
parfor jj = 1:numel(ind)
%     disp('Sorting categories...')
    maps{jj,1} = containers.Map();
    for i = 1:numel(ind{jj})
        str = strsplit(data{ind{jj}(i)}, ',');
        category = strsplit(str{1}, '.');
        category = category{1};
        newVal=sscanf([str{2} ',' str{3}],'%f,%f')';

        %Interesting stuff starts here
        if(isKey(maps{jj,1}, category))
            mapVal = [maps{jj,1}(category); newVal];
        else
            mapVal = newVal;
        end

        maps{jj,1} = [maps{jj,1}; containers.Map(category, mapVal)];

    end
end

% merge your map containers one-by-one
for kk = 2:numel(maps)
    auxkeys = maps{1}.keys;
    currkeys = maps{2}.keys;
    [ia,ib] = ismember(currkeys,auxkeys);
    if any(ia)
        ia = find(ia);
        for mm = 1:numel(ia)
            maps{1}(auxkeys{ib(ia(mm))}) = [maps{1}(auxkeys{ib(ia(mm))}); maps{2}(currkeys{ia(mm)})];
            remove(maps{2},currkeys{ia(mm)});
        end
    end
    maps{1} = [maps{1}; maps{2}];
    maps(2) = [];
end
toc

Upvotes: 1

Related Questions