Reputation: 1848
Suppose that a data set can be divided into three groups (e.g., negatives, zeroes, and positives) and that one desires to represent these groups in a plot using different symbols, assigning <
to negatives, o
to zeroes, and >
to positives; e.g.,
gscatter([-1 -1 0 0 1 1],[1 2 1 2 1 2],[-1 -1 0 0 1 1],'k','<o>',10); xlim([-3 3]); ylim([0 3]);
Suppose further that the input data to the gscatter
function lack representation from all groups. The group-symbol relationship could then change because, according to the Matlab documentation, gscatter
sequentially assigns symbols from the provided list to groups based upon the sorted order of the unique values of the grouping variable. The upshot of this grouping algorithm is that absence of representation from earlier-sorting groups produces a shift in the symbol/group assignment, thereby destroying symbolic significance (the precise symbol assigned to a group may be immaterial, but this question focuses on those cases where a particular symbol must invariably be assigned to a particular group). For instance, for a data set lacking negative values, gscatter
would assign the <
symbol to zeroes and the o
symbol to positives (the >
symbol going unused because a third symbol is extraneous when only two distinct groups exist); e.g.,
gscatter([-1 -1 0 0 1 1],[1 2 1 2 1 2],[-1 -1 0 0 1 1],'k','<o>',10); xlim([-3 3]); ylim([0 3]);
My question is whether one can deterministically assign a symbol to a particular group in cases where there is a possibility of missing groups from a data set (e.g., can one mandate assignment of <
to negative values even when no such values are present in the data set, to avoid shifting of the symbol/group relationship described above). The Matlab documentation seems to indicate that such an operation is impossible, meaning that one would have to rely on a series of if
statements to determine whether certain groups are missing and to appropriately redefine restricted symbol sets for each possible combination of group representations, but I seek to know whether this limitation can be circumvented more elegantly.
Upvotes: 0
Views: 418
Reputation: 20974
In general, for "processing data that may or may not be present" problems, there's always the horrible cheat of forcing the necessary parts of the data to exist:
x = [-1 -1 0 0 1 1];
y = [1 2 1 2 1 2];
group = [-1 -1 0 0 1 1];
gscatter([x NaN NaN NaN], [y NaN NaN NaN], [group -1 0 1], 'k', '<o>', 10);
(if, unlike plot
, gscatter
simply ignores a group of only NaN - I don't have it available to actually test - you could just use any coordinates well outside the final axes range)
Another possibility is changing the process itself to ensure consistency regardless of the data:
scatter(x(group == -1), y(group == -1), 10, 'k', '<');
hold on;
scatter(x(group == 0), y(group == 0), 10, 'k', 'o');
scatter(x(group == 1), y(group == 1), 10, 'k', '>');
hold off;
However, in this case explicitly checking the data and adjusting for what is present is almost certainly the nicest approach, given an appropriate Matlab idiom:
markers = '<o>';
midx = ismember([-1 0 1], group);
gscatter(x, y, group, 'k', markers(midx), 10);
Upvotes: 2