PeterFoster
PeterFoster

Reputation: 339

Binning and plotting (Hist) data from a (n,2) matrix

I (mostly) have a prototype script to achieve what I want, but I'm not programmer (yet), and what I wrote is very cumbersome. I could use some help fitting this into a package that is amenable to something more than 10 bins (see below). While we're at it, I also would love to know how to assign different colors to each series.

Briefly, I've got a (n,2) matrix --where n is 20,000 to 40,000) that consists of data for two variables. Typically, I make a scatterplot (or density plot) with each variable on an axis. Now, I want to slice up the data (err, divide the data into bins) along the x axis and plot a histogram for the y values in each bin. I then plot all the histograms for each of the bins on the same plot (preferably in different colors) to see more clearly how the distributions change as X changes.

NOTE: 1) the data is set on a log scale, hence logspace bins. 2) for the sake of argument, pretend that logicleHist is just a regular hist function.

EXAMPLE

%DensPlot Slicer
data=[BFP GFP];
dp_bins=10;
dp_bounds=logspace(1,5,dp_bins);

%bins
b1=data(data(:,1) >= dp_bounds(1) & data(:,1) < dp_bounds(2),:);
b2=data(data(:,1) >= dp_bounds(2) & data(:,1) < dp_bounds(3),:);
b3=data(data(:,1) >= dp_bounds(3) & data(:,1) < dp_bounds(4),:);
b4=data(data(:,1) >= dp_bounds(4) & data(:,1) < dp_bounds(5),:);
b5=data(data(:,1) >= dp_bounds(5) & data(:,1) < dp_bounds(6),:);
b6=data(data(:,1) >= dp_bounds(6) & data(:,1) < dp_bounds(7),:);
b7=data(data(:,1) >= dp_bounds(7) & data(:,1) < dp_bounds(8),:);
b8=data(data(:,1) >= dp_bounds(8) & data(:,1) < dp_bounds(9),:);
b9=data(data(:,1) >= dp_bounds(9) & data(:,1) < dp_bounds(10),:);

figure;
hold on
logicleHist(b1(:,2));
logicleHist(b2(:,2));
logicleHist(b3(:,2));
logicleHist(b4(:,2));
logicleHist(b5(:,2));
logicleHist(b6(:,2));
logicleHist(b7(:,2));
logicleHist(b8(:,2));
logicleHist(b9(:,2));

Suggestions? Thanks!

Upvotes: 1

Views: 544

Answers (2)

Lord Henry Wotton
Lord Henry Wotton

Reputation: 1362

If I understood your question right, you want to histogram y's (or data(:,2)) that correspond to 10 bins of x (or data(:,1)). Please see the code below and refer to commented code and SO for further explanation on the code.

% The following are custom-created to make the code self-contained, replace with 
% your data and bounds.
data(:,1)=rand(100,1);
data(:,2)=rand(100,1);
dp_bounds=logspace(min(data(:,1)),max(data(:,1)),10);
data(:,1)=10.^rand(100,1);

figure('Position',[10 10 800 750],'Color','w');
bar_color=colormap;
bar_color=bar_color(linspace(1,size(colormap,1),numel(dp_bounds)),:); % Select colors per bar
for ii=1:numel(dp_bounds)-1
    sel_data=data(data(:,1) >= dp_bounds(ii) & data(:,1) < dp_bounds(ii+1),2);
    subplot(numel(dp_bounds)-1,1,ii);
    [h,bins_y]=hist(sel_data);
    bar(bins_y,h,'FaceColor', bar_color(ii,:)); % Bar plot with y histograms (auto bins for y)
    title(['x from ',num2str(dp_bounds(ii)),' to ',num2str(dp_bounds(ii+1))],'FontSize', 12)
end

If you copy and paste the code above to the Matlab prompt, you should see something similar to the following figure. enter image description here

Update: the code above was tested on Matlab 2010. If using the 2014 version, you may have to replace:

[h,bins_y]=hist(sel_data);
bar(bins_y,h,'FaceColor', bar_color(ii,:));

with histogram(sel_data,'FaceColor', bar_color(ii,:)) (note the lack of a semi-colon) as observed in another solution.

Upvotes: 1

Dave Kielpinski
Dave Kielpinski

Reputation: 1182

The first step might be to use a for loop. Replace everything in your code after

%bins

with

figure
hold on
for i = 1:(dp_bins-1)
     b = data(data(:,1)>=dp_bounds(i) & data(:,2)<=dp_bounds(i+1),:)
     hist(b(:,2))
end

where b is playing the role of your b1, b2, ... in turn. Note histogram is the currently used function in the latest release of Matlab. I only have hist myself.

Note that you can assign the second index to b in a single statement. I would normally write

b = data(data(:,1)>=dp_bounds(i) & data(:,2)<=dp_bounds(i+1),2)
histogram(b)

If you want to overlay so many histograms, I think the plot will get very hard to read no matter what you do with the colors. It's also quite difficult to control the histogram colors with hist. I'd suggest using stem plots, rather than histograms, for each of the bs. This would require another manual binning step over each b, which you could accomplish with a nested for loop.

Upvotes: 1

Related Questions