Reputation: 666
I got a big data set of minutly data with multiple columns that needs to be converted from minutes to hours.
I am new to matlab and tried
data_minute = rand(data); % synthetic data
data_hour = mean(reshape(data_minute, 60, []))
which only gives me the hourly data from one row.
I wasnt able to work through every column with something like:
for i = 1:n_columns
data_hour(:,i) = mean(reshape(data_minute(:,i),60, []));
end
Trying a For-Loop to sample every 60 data plots also didn't work out.
Looking at a solution in google didn't give me a result i understood.
Update:
For clarification the data looks something like this:
minute value
1 501
2 479
3 449
4 463
5 404
6 173
7 141
8 141
9 141
10 140
11 140
12 140
13 140
14 202
15 206
16 206
.. ...
525604 120
Upvotes: 0
Views: 263
Reputation: 25140
This sounds like a job for timetable
and retime
. First make a timetable
, using a duration
for the "time" variable - it's easy to create a duration
array using the minutes
function. For example:
>> tt = timetable(minutes(0:1000)', rand(1001, 1));
>> % Just look at the first few rows of 'tt':
>> head(tt)
ans =
8×1 timetable
Time Var1
_____ ________
0 min 0.31907
1 min 0.98605
2 min 0.71818
3 min 0.41318
4 min 0.09863
5 min 0.73456
6 min 0.63731
7 min 0.073842
>> % use 'retime' to get the hourly means:
>> rt = retime(tt, 'hourly', 'mean')
rt =
17×1 timetable
Time Var1
_______ _______
0 min 0.47755
60 min 0.47877
120 min 0.48007
180 min 0.55399
240 min 0.5142
300 min 0.5656
360 min 0.50957
420 min 0.48986
480 min 0.49568
540 min 0.55133
600 min 0.49981
660 min 0.53677
720 min 0.49343
780 min 0.53409
840 min 0.47901
900 min 0.55287
960 min 0.48173
Upvotes: 2
Reputation: 10792
We want to: Downsample the data with an aggregation or an interpolation of all the measurements grouped by hour.
If we take this example data matrice:
M = [10, 3,4,5,6;
2000, 3,4,3,5;
5000, 4,4,4,4]
And we say that the first column correspond to the time in second, and the other columns correspond to your measurements.
Solution 1: Aggregation with accumarray
% we start by calculating the time in hour (3600 seconds in one hour).
hour = ceil(M(:,1)/3600)
% We extract the measurements
val = M(:,2:end)
% nrow = How many different measurements ?
nrow = size(val,2);
% How many unique hour ?
[uid,~,id] = unique(hour);
% creation of a sub index grouping the measurements by hour and by column
sub = [repmat(id,nrow,1),kron(1:nrow,ones(1,length(id))).']
sub = 1 1 1 1 2 1 1 2 1 2 2 2 1 3 1 3 2 3 1 4 1 4 2 4
%We calculate the result using accumarray (first column = hour):
RES = [uid,accumarray(sub,val(:),[],@median)] %if you want the mean choose @mean
RES = 1.0000 3.0000 4.0000 4.0000 5.5000 2.0000 4.0000 4.0000 4.0000 4.0000
Solution 2: Interpolation with interp1
You can interpolate your data with interp1
interp_second = unique(floor(M(:,1)/3600))*3600
%création of an unique index
uid = unique(ceil(M(:,1)/3600))
% We extract the measurements
val = M(:,2:end)
% Result (first column = hour)
RES = [uid,interp1(M(:,1),val,interp_second)]
Conclusion
I would recommand the solution 1, because the method is more robust.
Upvotes: 1