Phil
Phil

Reputation: 666

Resampling multiple data columns from minutes to hours in matlab

I got a big data set of minutly data with multiple columns that needs to be converted from minutes to hours.

I am new to matlab and tried

data_minute = rand(data);  % synthetic data
data_hour = mean(reshape(data_minute, 60, []))

which only gives me the hourly data from one row.

I wasnt able to work through every column with something like:

for i = 1:n_columns
data_hour(:,i) = mean(reshape(data_minute(:,i),60, []));
end

Trying a For-Loop to sample every 60 data plots also didn't work out.

Looking at a solution in google didn't give me a result i understood.

Update:

For clarification the data looks something like this:

minute   value
1   501
2   479
3   449
4   463
5   404
6   173
7   141
8   141
9   141
10  140
11  140
12  140
13  140
14  202
15  206
16  206
..  ...
525604 120

Upvotes: 0

Views: 263

Answers (2)

Edric
Edric

Reputation: 25140

This sounds like a job for timetable and retime. First make a timetable, using a duration for the "time" variable - it's easy to create a duration array using the minutes function. For example:

>> tt = timetable(minutes(0:1000)', rand(1001, 1));
>> % Just look at the first few rows of 'tt':
>> head(tt)
ans =
  8×1 timetable
    Time       Var1  
    _____    ________
    0 min     0.31907
    1 min     0.98605
    2 min     0.71818
    3 min     0.41318
    4 min     0.09863
    5 min     0.73456
    6 min     0.63731
    7 min    0.073842
>> % use 'retime' to get the hourly means:
>> rt = retime(tt, 'hourly', 'mean')
rt =
  17×1 timetable
     Time       Var1  
    _______    _______
    0 min      0.47755
    60 min     0.47877
    120 min    0.48007
    180 min    0.55399
    240 min     0.5142
    300 min     0.5656
    360 min    0.50957
    420 min    0.48986
    480 min    0.49568
    540 min    0.55133
    600 min    0.49981
    660 min    0.53677
    720 min    0.49343
    780 min    0.53409
    840 min    0.47901
    900 min    0.55287
    960 min    0.48173

Upvotes: 2

obchardon
obchardon

Reputation: 10792

We want to: Downsample the data with an aggregation or an interpolation of all the measurements grouped by hour.

If we take this example data matrice:

M = [10,   3,4,5,6;
     2000, 3,4,3,5;
     5000, 4,4,4,4]

And we say that the first column correspond to the time in second, and the other columns correspond to your measurements.

Solution 1: Aggregation with accumarray

% we start by calculating the time in hour (3600 seconds in one hour).
hour = ceil(M(:,1)/3600)

% We extract the measurements
val = M(:,2:end)

% nrow = How many different measurements ?
nrow = size(val,2);

% How many unique hour ?
[uid,~,id] = unique(hour);

% creation of a sub index grouping the measurements by hour and by column
sub = [repmat(id,nrow,1),kron(1:nrow,ones(1,length(id))).']
sub = 
   1   1
   1   1
   2   1
   1   2
   1   2
   2   2
   1   3
   1   3
   2   3
   1   4
   1   4
   2   4
%We calculate the result using accumarray (first column = hour):
RES = [uid,accumarray(sub,val(:),[],@median)] %if you want the mean choose @mean
RES =

1.0000   3.0000   4.0000   4.0000   5.5000
2.0000   4.0000   4.0000   4.0000   4.0000

Solution 2: Interpolation with interp1

You can interpolate your data with interp1

interp_second = unique(floor(M(:,1)/3600))*3600

%création of an unique index
uid = unique(ceil(M(:,1)/3600))

% We extract the measurements
val = M(:,2:end)

% Result (first column = hour)
RES = [uid,interp1(M(:,1),val,interp_second)]

Conclusion

I would recommand the solution 1, because the method is more robust.

Upvotes: 1

Related Questions