Reputation: 1144
I have a matlab dataset that looks like this:
year value
1995 90000
1995 53000
1995 80000
1995 60000
1995 37000
1995 42000
1995 13102
1996 35000
1996 50000
1996 32000
1996 47000
1997 36000
1997 90000
1997 NaN
1997 90000
1997 51500
1997 81000
1998 71000
(...)
2020 68000
These are two separate columns of data.
Now I want to count the number of non-NaN observations in column value
between 2010 and 2020 per year i.e. the output should look like:
year count
2010 20
2011 31
(...)
2020 9
If any count is zero, it should show up as zero.
I know I can do it with a very simple loop (example below). But this is very inefficient for a large dataset. I was looking into accumarray, but could not figure out how to do it.
N = 300;
%Generate years vector
years = round(1996 + (2020-1996) .* (rand(N,1)));
years = sort(years);
% Generate values vector
values = rand(N,1);
NaN_position = rand(N,1)>.9; %Now put some random NaNs
values(NaN_position) = NaN;
count = 1;
for y=min(years):max(years)
indicator = years == y;
count_vals(count,1) = sum(not(isnan(values(indicator))));
count = count + 1;
end
Upvotes: 1
Views: 141
Reputation: 112659
Let the data be defined as:
years = [1995 1995 1995 1995 1995 1995 1995 1996 1996 1996 1996 1997 1997 1997 1997 1997 1997 1998 2020].';
values = [90000 53000 80000 60000 37000 42000 13102 35000 50000 32000 47000 36000 90000 NaN 90000 51500 81000 71000 68000].';
year_min = 1996;
year_max = 1998;
Then:
result_year = year_min:year_max;
result_count = histcounts(years(~isnan(values)), [result_year year_max+.5]);
The term year_max+.5
is needed in the second input of histcounts
because, as per the documentation, the last bin includes the right edge.
Upvotes: 2