Dominik
Dominik

Reputation: 792

find mean or median date of event

I have a dataset for which I have extracted the date at which an event occurred. The date is in the format of MMDDYY although MatLab does not show leading zeros so often it's MDDYY.

Is there a method to find the mean or median (I could use either) date? median works fine when there is an odd number of days but for even numbers I believe it is averaging the two middle ones which doesn't produce sensible values. I've been trying to convert the dates to a MatLab format with regexp and put it back together but I haven't gotten it to work. Thanks

dates=[32381 41081  40581  32381  32981 41081   40981  40581];

Upvotes: 2

Views: 1691

Answers (4)

Carlos
Carlos

Reputation: 95

Try this:

dates=[32381 41081 40581 32381 32981 41081 40981 40581];
d=zeros(1,length(dates));
for i=1:length(dates)
    d(i)=datenum(num2str(dates(i)),'ddmmyy');
end
m=mean(d);
m_str=datestr(m,'dd.mm.yy')

I hope this info to be useful, regards

Upvotes: 1

mwengler
mwengler

Reputation: 2778

You see above how to present dates as numbers.

I will add no your issue of finding median of the list. The default matlab median function will average the two middle values when there are an even number of values.

But you can do it yourself! Try this:

dates; % is your array of dates in numeric form
sdates = sort(dates);
mediandate = sdates(round((length(sdates)+1)/2));

Upvotes: 0

Gunther Struyf
Gunther Struyf

Reputation: 11168

You can use datenum to convert dates to a serial date number (1 at 01/01/0000, 2 at 02/01/0000, 367 at 01/01/0001, etc.):

strDate='27112011';
numDate = datenum(strDate,'ddmmyyyy')

Any arithmetic operation can then be performed on these date numbers, like taking a mean or median:

mean(numDates)
median(numDates)

The only problem here, is that you don't have your dates in a string type, but as numbers. Luckily datenum also accepts numeric input, but you'll have to give the day, month and year separated in a vector:

numDate = datenum([year month day])

or as rows in a matrix if you have multiple timestamps.

So for your specified example data:

dates=[32381 41081  40581  32381  32981 41081   40981  40581];
years  = mod(dates,100);
dates  = (dates-years)./100;
days   = mod(dates,100);
months = (dates-days)./100;
years = years + 1900; % set the years to the 20th century

numDates = datenum([years(:) months(:) days(:)]);
fprintf('The mean date is %s\n', datestr(mean(numDates)));
fprintf('The median date is %s\n', datestr(median(numDates)));

In this example I converted the resulting mean and median back to a readable date format using datestr, which takes the serial date number as input.

Upvotes: 5

Dan Nissenbaum
Dan Nissenbaum

Reputation: 13918

Store the dates as YYMMDD, rather than as MMDDYY. This has the useful side effect that the numeric order of the dates is also the chronological order.

Here is the pseudo-code for a function that you could write.

foreach date:
    year = date % 100
    date = (date - year) / 100
    day = date % 100
    date = (date - day) / 100
    month = date
    newdate = year * 100 * 100 + month * 100 + day
end for

Once you have the dates in YYMMDD format, then find the median (numerically), and this is also the median chronologically.

Upvotes: 0

Related Questions