Reputation: 1
So I'm working on creating a MATLAB script for the lag-time correction of several .mat files with the second set taken as reference. The .mat files come from two spectrometers that sample at 10 (LGR) and 5 (AERIS) Hz and each .mat file corresponds to a day from 00:00:00.000 to 23:59:59.900. In total for both instruments and file folders I have 54 files corresponding to 54 days. For simplicity I've downsampled the LGR data values to match the number of data of AERIS, since LGR has double their numbers. AERIS samples at every instant with a lag compared to LGR and this lag accumulates as we keep both instruments running. The goal here is to create a script that corrects the lag. Lag-time plot without time correction
%function [REG_RR,REG_LAG] = lagtime_shahid(LGR, Aeris, max_lag)
function [REG_RR,REG_LAG] = lagtime_shahid(LGR, corrected_Aeris, max_lag)
tic
% max_lag is the parameter indicating the maximum lag-time to be evaluated (30000)
% these two lines basically read and upload the LGR (reference)
% and AERIS (time correction needed) datasets
% the "dir" function get a list of files in the folder specified by
% "infile_folder_lgr" and "infile_folder_cpec"
% ch4*.mat means that there are random letters between ch4 and .mat
filelist_lgr = dir(fullfile(LGR, 'ch4*.mat'));
%filelist_aeris = dir(fullfile(Aeris,'aeris*.mat'));
filelist_aeris = dir(fullfile(corrected_Aeris,'corrected_aeris*.mat'));
REG_LAG = []; % this stores the results of lag-time evaluation in REG_LAG
REG_RR = []; % this stores the results of regression results (?) in REG_RR2
UTC = []; % this stores the results of time in UTC (?)
for idx = 1:numel(filelist_lgr) % "numel()" mi da il numero di elementi di filelist_lgr
indata_lgr = load([LGR,filelist_lgr(idx).name]);
%indata_aeris = load([Aeris,filelist_aeris(idx).name]);
indata_aeris = load([corrected_Aeris,filelist_aeris(idx).name]); % aeris has two more rows than lgr (x13 instead of x11)
% """ Those below are to visualize the time column for the first 20 lines in the
% Aeris and LGR datasets. They are to put in the command window.
% Following this, the Aeris are taken at 5Hz (.000 .200 .400 ...)
% while LGR is taken at 10Hz (.000 .100 .200 .300 .400 ...) """
% datestr(indata_lgr.data(1:20,1),'yyyy-mm-dd HH:MM:SS.FFF')
% datestr(indata_aeris.data(1:20,1),'yyyy-mm-dd HH:MM:SS.FFF')
% """ I need to take the lgr data evey two because I need it to be equal in
% number as the Aeris one since LGR is at 10Hz and Aeris at 5Hz and
% thus has one more value every two """
indata_lgr.data = indata_lgr.data(1:2:end, :);
DayDate = floor(indata_lgr.data(1,1));
for h_dex = 1:8 % runs from 1:8 because it's every 3 hours
% data selection keys (boolean)
key_data = (indata_lgr.data(:,1)>=DayDate+(h_dex-1)/8 & indata_lgr.data(:,1)<DayDate+h_dex/8);
lgr_sig = indata_lgr.data(key_data,3); % column dependent on dataset
aeris_sig = indata_aeris.data(key_data,3);
tmp = find(key_data);
utc(h_dex) = indata_aeris.data(tmp(1),1);
% find best filter and lag
% max_lag = 8000;
reg(h_dex) = find_xcorNbox(aeris_sig, lgr_sig, max_lag, 1);
if sum(isnan(aeris_sig)) > 0.5*numel(aeris_sig) % exchange lgr_sig with the aeris_sig
reg(h_dex).best_lag = NaN;
reg(h_dex).best_boxwidth = NaN;
end
%apply filtering and lagging to fast CO2 signal to match slow signal
%match_sig = lagNbox(fast_sig, reg.best_lag, reg.best_boxwidth);
end
REG_LAG = vertcat(REG_LAG,reg.best_lag);
REG_RR = vertcat(REG_RR,reg.r_maxmax);
UTC = vertcat(UTC,utc'); % UTC = vertcat(UTC,utc'); UTC is a 54x48 (days x half_hour) matrix = 2592 elements
% UTC must be reshaped to a 2592x1 matrix if I want to plot it against
% REG_RR and REG_LAG
% UTC_reshaped = reshape(UTC, [], 1);
% UTC = UTC';
end
%%
temp.time = UTC;
temp.lag = REG_LAG;
temp.corr = REG_RR;
figure;
plot(datetime(UTC,'ConvertFrom','datenum'),REG_LAG./5,'.red');
title('Lag-time between Aeris and LGR each 3 hour interval');
xlabel('Time [UTC]');
ylabel('Lag-time [s]');
This code is fine, as it helps to elaborate the two datasets and show with a plot how the points corresponding to the accumulated lag between the two instruments are displacing them through the days. The plot in this section and the one shown after are done using this code exchanging the normal and corrected AERIS dataset.
I'll add also the additional functions that are being used in this script below, "find_xcorNbox" and "lagNbox"
%% Simplified xcorNbox funciton by Shahid Naqui
% This function checks the relationship between the fast and slow signals
% by applying various "boxcar" filters, computing cross-coorelation at
% different lags and performing regressio analysis to find the optimal
% alignment and filtering parameters
% the fast and slow signals have nothing to do with the frequency they are
% analysing, it's just an old name for the to function inputs
function reg = find_xcorNbox(fast_sig, slow_sig, max_lag, boxwidths)
% find_xcorNbox: Computes the cross-correlation and regression between fast and slow signals
% after applying a boxcar filter of varying widths and shifting to find the optimal lag.
%
% Inputs:
% fast_sig - Fast signal data
% slow_sig - Slow signal data
% max_lag - Maximum lag to consider for cross-correlation
% boxwidths - Array of boxcar widths (positive, odd integers)
%
% Outputs:
% reg - Structure containing the best boxcar width, best lag, maximum correlation coefficient,
% and other statistical metrics.
% Initialize variables
num_widths = length(boxwidths);
r = NaN(max_lag, num_widths);
r_max = NaN(num_widths, 1);
slope = NaN(num_widths, 1);
sig_slp = NaN(num_widths, 1);
slope_star = NaN(num_widths, 1);
best_lags = NaN(num_widths, 1);
lags = -max_lag:max_lag;
cols = hot(num_widths);
% this loop iterates over each boxcar width provided in "boxwidths"
for b_dex = 1:num_widths
boxwidth = boxwidths(b_dex);
% Apply boxcar filter:
% for each boxcar width, the "fast_sig" is filtered using "lagNbox"
% function
deteriorated_sig = lagNbox(fast_sig, 0, boxwidth);
deteriorated_sig(1:max(boxwidths)) = NaN; % Set initial values to NaN
% Ensure signals are finite in both signals
% replaces NaN values with the mean of finite values to maintain
% signal integrity
B_bothfinite = isfinite(deteriorated_sig) & isfinite(slow_sig);
mean_det = mean(deteriorated_sig(B_bothfinite));
deteriorated_sig(~B_bothfinite) = mean_det;
mean_slow = mean(slow_sig(B_bothfinite));
slow_sig(~B_bothfinite) = mean_slow;
% Calculate cross-correlation:
% computes the cross-covariance between the "slow_sig" and the
% "deteriorated_sig". Then normalizes to get the cross-correlation.
% In the end, finds the lag that maximizes the cross-correlation
std_slow = std(slow_sig);
std_deteriorated = std(deteriorated_sig);
[est_xcov, x_lag] = xcov(slow_sig(max(boxwidths):end) - mean_slow, ...
deteriorated_sig(max(boxwidths):end) - mean_det, ...
max_lag, 'none');
est_xcorr = est_xcov / std_slow / std_deteriorated / (numel(slow_sig) - 1);
[max_xcorr, I] = max(est_xcorr);
best_lags(b_dex) = x_lag(I); % Store the best lag for the current boxwidth
% Adjust signal by the best lag
% shifts the deteriorated signal by the best lag found and adjusts
% the signal ends to maintain consistency after the shift
deteriorated_sig = circshift(deteriorated_sig, best_lags(b_dex), 1);
if best_lags(b_dex) > 0
deteriorated_sig(1:best_lags(b_dex)) = mean_det;
elseif best_lags(b_dex) < 0
deteriorated_sig(end+best_lags(b_dex)+1:end) = mean_det;
end
% Perform regression analysis between the adjusted deteriorated
% signal and the slow_sig, plus stores various regression metrics
% for each boxcar width
reg_all = reg_mg(deteriorated_sig(max(boxwidths):end), slow_sig(max(boxwidths):end));
r(1:length(lags), b_dex) = est_xcorr;
r_max(b_dex) = reg_all.r;
slope(b_dex) = reg_all.b;
sig_slp(b_dex) = reg_all.sig_b;
slope_star(b_dex) = reg_all.b_star;
end
% Determine the best boxwidth and lag
% Identifies the boxcar width and lag that yield the maximum
% correlation coefficient
[r_maxmax, J] = max(r_max);
best_boxwidth = boxwidths(J);
best_lag = best_lags(J);
% Output structure
reg = struct('best_boxwidth', best_boxwidth, 'best_lag', best_lag, 'r_maxmax', r_maxmax, ...
'lags', lags, 'boxwidths', boxwidths, 'r', r, 'r_max', r_max, ...
'slope', slope, 'sig_slp', sig_slp, 'slope_star', slope_star);
end
function deteriorated_sig = lagNbox(fast_sig, lag, boxwidth)
width = oddify(boxwidth); % if boxwidth is 0 this step can be neglected
% fast_aux = smooth(fast_sig,width); replace smoothing by filter
fast_aux = filter(ones(width,1)/width,1,fast_sig);
fast_aux(1:width) = NaN; % filter needs 'width' datapoints to spin up
if(lag==0)
deteriorated_sig = fast_aux;
else
deteriorated_sig = circshift(fast_aux,lag,1);
deteriorated_sig(1:lag) = NaN;
end
end
function oddN = oddify(N)
if(N<0 || N-round(N)~=0)
warning('box car width must be a positve, odd integer')
N = abs(round(N));
end
oddN = N + (1-mod(N,2));
end
So here I'm placing the script that I came up to for the time correction. Basing myself with the previous plot, I wanted to define a scaling factor k for each of the lag-time slopes shown in red. The general idea is to use this scaling factor (one for each slope), which follows from an assumption that the behavior of the slopes is linear, to help scale down the time variable of AERIS and with a linear interpolation "interp1" function on the data values hoping to get the new corrected values Here is the plot after the time correction, obtained after running the same script for the lag-time slopes. Some of the points are along the zero value and this is what I need to get, but the rest are randomly placed around and that is something I need to solve
%% Lag-time script with new approach
% Define the directories for AERIS and LGR data
Aeris = 'C:\Users\shahi\OneDrive\Desktop\Thesis Innsbruck 2024\lagtime analysis\Aeris\';
LGR = 'C:\Users\shahi\OneDrive\Desktop\Thesis Innsbruck 2024\lagtime analysis\LGR\';
% Get the list of AERIS and LGR files
aerisFiles = dir(fullfile(Aeris, 'aeris_ch4*.mat'));
lgrFiles = dir(fullfile(LGR, 'ch4_*.mat'));
%% Define segments and lag times (I identified eleven, 11, time periods from the lagtime.m script)
seg = [datenum([2023 10 13 06 00 00]) - datenum([2023 10 11 21 00 00]), ...
datenum([2023 10 16 18 00 00]) - datenum([2023 10 13 12 00 00]), ...
datenum([2023 10 17 03 00 00]) - datenum([2023 10 16 21 00 00]), ...
datenum([2023 11 01 03 00 00]) - datenum([2023 10 17 06 00 00]), ...
datenum([2023 11 07 09 00 00]) - datenum([2023 11 07 06 00 00]), ...
datenum([2023 11 14 06 00 00]) - datenum([2023 11 07 15 00 00]), ...
datenum([2023 11 16 12 00 00]) - datenum([2023 11 14 09 00 00]), ...
datenum([2023 11 17 06 00 00]) - datenum([2023 11 16 15 00 00]), ...
datenum([2023 11 27 06 00 00]) - datenum([2023 11 17 09 00 00]), ...
datenum([2023 11 28 06 00 00]) - datenum([2023 11 27 15 00 00]), ...
datenum([2023 12 03 21 00 00]) - datenum([2023 12 01 15 00 00])];
lag_time = [-495.4 + 18.8, -1236.4 + 54.6, -88.4 + 11.2, -5491.4 + 0.8, ...
-70.8 + 4.2, -2352 + 19.6, -743.4 + 25, -246 + 19.6, ...
-3521.4 + 36.6, -3199.8 + 2965.2, -802.8 + 19];
% Calculate day drift and k-factors
day_drift = lag_time ./ seg;
kfactor = 86400 ./ (86400 - day_drift);
% Here I just wanted to create a space where I could see all in one place
% the results obtained from above
results = struct();
results.header = {'seg', 'lag_time', 'kfactor'};
results.data = struct('seg', num2cell(seg), 'lag_time', num2cell(lag_time), 'kfactor', num2cell(kfactor));
%% Process each file
offset = 0; % Initializing offset variable
corrected_Aeris = 'corrected_Aeris'; % Directory to save corrected AERIS data
if ~exist(corrected_Aeris, 'dir')
mkdir(corrected_Aeris);
end
for idx = 1:numel(lgrFiles)
try
% Load AERIS and LGR data
aerisData = load(fullfile(Aeris, aerisFiles(idx).name));
lgrData = load(fullfile(LGR, lgrFiles(idx).name));
aeris = aerisData.data;
lgr = lgrData.data;
lgr = lgr(1:2:end, :); % Downsample LGR data to match the Aeris sample rate (5Hz)
% Determine the segment and k-factor
% Initialize where to store the segment and the k-factor corresponding to the data
segindex = [];
kfactor_used = [];
for i = 1:length(seg)-1
if aeris(1,1) >= seg(i) && aeris(1,1) < seg(i+1) % Check if the first timestamp of AERIS falls within the current segment
% If the condition is met I then assign the values
segindex = seg(i);
kfactor_used = kfactor(i);
break;
end
end
% Handles the case where aeris(1,1) is greater than or equal the last...
% segment assigning the last segment and k-factor
if isempty(segindex) && aeris(1,1) >= seg(end)
segindex = seg(end);
kfactor_used = kfactor(end);
end
% Correct AERIS timestamps
startdate = aeris(1, 1);
XX = aeris(:, 1);
corr_time = offset + [datenum(startdate); datenum(startdate) + cumsum(diff(datenum(XX)) * kfactor_used/86400)];
% Remove duplicate timestamps
[corr_time_unique, unique_idx] = unique(corr_time);
aeris_values_unique = aeris(unique_idx, 3); % Assuming the methane values are in the 3rd column
% Interpolate AERIS data
corrected_aeris_values = interp1(corr_time_unique, aeris_values_unique, lgr(:, 1), 'linear', 'extrap');
% Create a new dataset with LGR timestamps and corrected AERIS values
corrected_aeris_data = [lgr(:, 1), aeris(:, 2), corrected_aeris_values]; % Keeping the second column (diff) as is
% Save the corrected AERIS data
corrected_aeris_Data = struct('data', corrected_aeris_data);
save(fullfile(corrected_Aeris, ['corrected_' aerisFiles(idx).name]), '-struct', 'corrected_aeris_Data');
offset = offset + aeris(end, 1) - corr_time(end,1);
catch ME
fprintf('Error processing file %s: %s\n', aerisFiles(idx).name, ME.message);
end
end
disp('Time drift correction and interpolation completed.');
I need to find a way to clear out those points that are not displaced along 0s value and bring them on this y-axis value, as this is the task. Positive lag-time points are totally wrong and should not be there at all as the original plot without correction has only negative point distribution.
Upvotes: 0
Views: 78