Reputation: 21
I am new to R and have just started my research work, so please excuse if the answer is obvious. I have tried to find the answer in other questions, but I am not sure if I am using the right terms. Including this similar, but not identical question (R Stats: Comparing timestamps in two dataframes).
For my research question we wanted to measure episodes of heart arrhythmia (atrial fibrillation=afib) in patients. We did this using two different methods: ECG and PPG.
Therefore we have two different dataframes per patient.
ECG:
start | end | type
19.10.2020 11:34:53 | 19.10.2020 11:35:24 | noise
19.10.2020 22:49:53 | 19.10.2020 22:59:53 | Afib
19.10.2020 23:00:21 | 19.10.2020 23:10:53 | Afib
19.10.2020 23:47:14 | 19.10.2020 23:56:22 | Afib
PPG:
start | end | type
19.10.2020 11:25:53 | 19.10.2020 11:40:24 | noise
19.10.2020 22:49:53 | 19.10.2020 22:59:53 | Afib
19.10.2020 23:00:21 | 19.10.2020 23:15:53 | Afib
19.10.2020 23:42:04 | 19.10.2020 23:54:38 | Afib
20.10.2020 00:02:14 | 20.10.2020 00:19:26 | Afib
Each Row represents either one episode of Afib or one episode of noise (signal not good enough for detection). The measurement was continuous, but only arrhythmic events were documented.
We want to compare the second method to the first method to see if it would be a viable alternative to detect heart arrhythmia in patients. Hence we want to find:
true positives: Episodes which were detected in the goldstandard (ECG) and PPG (row 2 in the example above)
false positives: Episodes that were only detected using the PPG method. (row 5 in the example above)
and so forth...
Up until now I have changed the format of the timestamps, so that R will know that it is time and not just text, with the line:
ppg$Start<-dmy_hms(ppg$Start, tz=Sys.timezone())
ppg$End<-dmy_hms(ppg$End, tz=Sys.timezone())
leading to:
2020-10-19** 22:49:53 | 2020-10-19** 22:59:53 | Afib
The condition for a true positive is if an ECG episode overlaps with a PPG episode for 30 seconds.
How would I go and implement this to count true and false positives in R?
Thank you for your help.
Upvotes: 2
Views: 88
Reputation: 76402
The following function is probably too complicated but I think it does what the question asks for.
Its input arguments are
X
a ECG data.frameY
a PPG data.frameduration
Minimum durationstartcol
name of the start datet imes columnendcol
name of the end date times columnnoisecol
which column has the type
, if it's "noise"
count this row outnoiseval
a vector of values not to be considered.And the output is a list with members TP
and FP
.
overlapDuration <- function(X, Y, duration = 30, startcol, endcol, noisecol, noiseval){
overlap_length <- function(x, y){
if(int_overlaps(x, y)){
xstart <- int_start(x)
xend <- int_end(x)
ystart <- int_start(y)
yend <- int_end(y)
start <- max(xstart, ystart)
end <- min(xend, yend)
int <- interval(start, end)
int_length(int)
} else NA
}
xname <- deparse(substitute(X))
yname <- deparse(substitute(Y))
Xi <- interval(X[[startcol]], X[[endcol]])
Yi <- interval(Y[[startcol]], Y[[endcol]])
overl <- sapply(Yi, \(x){
sapply(Xi, overlap_length, x)
})
i <- which(X[[noisecol]] %in% noiseval)
j <- which(Y[[noisecol]] %in% noiseval)
overl[i, j] <- NA
w <- which(!is.na(overl) & overl >= duration, arr.ind = TRUE)
colnames(w) <- c(xname, yname)
TP <- cbind(w, secs = overl[w])
FP <- which(!(rownames(Y) %in% w[, yname] | Y[[noisecol]] %in% noiseval))
list(TP = TP, FP = FP)
}
minduration <- 30
start <- "start"
end <- "end"
typecol <- "type"
noise <- "noise"
overlapDuration(ECG, PPG, minduration, start, end, typecol, noise)
#$TP
# ECG PPG secs
#[1,] 2 2 600
#[2,] 3 3 632
#[3,] 4 4 444
#
#$FP
#[1] 5
Upvotes: 1