Reputation: 23
I have multiple data sets from different sources of varying length. One txt file has time in seconds , other one has in 10hz ( varying at times) so my data is messy. I am trying to compare these kind of data sets , but I need a smart way to sync the timeseries first along with adjacent columns of data. Any help would be appreciated.
Here are two example data sets:
Time data 1 data 2 data 3
12:19:00 PM 0.06875 0.1625 0
12:19:01 PM 0.06875 0.1625 0
12:19:02 PM 0.06875 0.1625 0
12:19:05 PM 0.06875 0.1625 0
12:20:06 PM 0.06875 0.15625 0
12:20:00 PM 0.06875 0.1625 0.02300251
size of data one is 600, 10
Data set 2 looks similar with more columns and different start and end time with different frequency so size of data 2 is [1000, 40]
Time data 4 data 5 data 6 data 7 ...
12:00:00 PM 0.45875 0.1625 0
12:19:01 PM 0.06875 0.1625 0
12:19:01 PM 0.06875 0.1625 0
12:19:01 PM 0.06875 0.1625 0
12:20:00 PM 0.06875 0.15625 0
12:20:00 PM 0.06875 0.1625 0.02300251
...
1.00.20 PM ... ... ...
sorry if my question is not clear.
I am looking to generate a third time axis based on the shorter time axis. so for this case I have to average the second file into 2 sec intervals ( taking into account missing data) Objective is to compare data 1 and data 2 from data set 1 and data set 2 at the same time stamps
Size of file1 is not equal to size of file 2
Upvotes: 2
Views: 7026
Reputation: 760
The best way for comparing the two sets of data is to time sync it.
x1=dlmread('C:\folder\yourfile1.txt');
x2=dlmread('C:\folder\yourfile2.txt');
t1 = x1(:,1);
tsr1=timeseries(x1,t1);
t2 = x2(:,1);
tsr2=timeseries(x2,t2);
% Sync 1 and 2
[new_ts12_1 new_ts12_2] = synchronize(tsr1,tsr2,'Uniform','Interval',1)
You can decide how you want to interpolate the data; the default is linear
Upvotes: 1
Reputation: 38052
You'll need to interpolate; you can use interp1
for your case.
Use like so:
new_data = interp1(times, [data1 data2 data3 ...], new_times)
your times
need not be sorted. new_times
is then the (equally-spaced) times you want values on.
This results in linear interpolation. You could do
new_data = interp1(times, [data1 data2 data3 ...], new_times, 'cubic')
to use a cubic interpolant. See help interp1
for more information.
Note that new_data
will be size(new_times) x size([data1 data2 data3 ...])
.
EDIT:
So, for your case, this is how you'd use it:
% Your data sets
dataset_1 = {...
'12:19:00 PM 0.06875 0.1625 0'
'12:19:01 PM 0.06875 0.1625 0'
'12:19:02 PM 0.06875 0.1625 0'
'12:19:05 PM 0.06875 0.1625 0'
'12:20:06 PM 0.06875 0.15625 0'
'12:20:00 PM 0.06875 0.1625 0.02300251'
};
dataset_2 = {...
'12:00:00 PM 0.45875 0.1625 0'
'12:19:01 PM 0.06875 0.1625 0'
'12:19:01 PM 0.06875 0.1625 0'
'12:19:01 PM 0.06875 0.1625 0'
'12:20:00 PM 0.06875 0.15625 0'
'12:20:00 PM 0.06875 0.1625 0.02300251'
};
% (This step is probably not needed (or should be changed) if you're
% reading from file)
dataset_1 = cellfun(@(x) textscan(x, '%s%s%f%f%f'), dataset_1, 'UniformOutput', false);
dataset_2 = cellfun(@(x) textscan(x, '%s%s%f%f%f'), dataset_2, 'UniformOutput', false);
% Extract & convert times
times_1 = cellfun(@(x) datenum( [x{1}{1} x{2}{1}] ), dataset_1);
times_2 = cellfun(@(x) datenum( [x{1}{1} x{2}{1}] ), dataset_2);
% Prepare the data for interpolation
dataset_1 = cellfun(@(x) [x{3:end}], dataset_1, 'UniformOutput', false);
dataset_1 = cell2mat(dataset_1);
dataset_2 = cellfun(@(x) [x{3:end}], dataset_2, 'UniformOutput', false);
dataset_2 = cell2mat(dataset_2);
[times_12, inds] = unique([times_1; times_2]); % (must use distrinct times)
dataset_12 = [dataset_1; dataset_2];
dataset_12 = dataset_12(inds,:); % (and corresponding data)
% Create a new times vector, that increases in regular steps
% (100 for this example)
times_3 = linspace(min(times_12), max(times_12), 100);
% Now interpolate
dataset_3 = interp1(times_12, dataset_12, times_3)
Upvotes: 0