Carolin V
Carolin V

Reputation: 51

Calculate the minimum difference between multiple dates

Dear Stackflow Community,

My dataset consists of Patient ID and Dates where a specific laboratory value was above the threshold. Now I would like to know how to get the minimum difference of all those dates per patient.
My goal is, to identify patients, with two or more dates which should be in a 2-year window. According to my research plan, I would like to calculate a column to state if the minimum difference is not more than two years.

My dataset looks now similar to this:

    Patient_ID  Date1   Date2   Date3   Date4   Date5   Date6   Date7   Date8   Date9   Date10  Date11  Date12  Date13  Date14
1   1   13.05.19    13.09.07    13.05.09    13.01.11    13.09.12    13.05.14    13.01.16    13.09.17    14.09.17    15.09.17    16.09.17    17.09.17    18.09.17    19.09.17
2   2   13.07.18    13.11.06    13.07.08    13.03.10    13.11.11    13.07.13    13.03.15    13.11.16    13.09.07    13.07.98    13.05.89    13.03.80    13.01.71    13.11.61
3   3   13.09.17    13.01.06    13.09.07    13.05.09    13.01.11    13.09.12    13.05.14    13.01.16    13.11.06    13.09.97    13.07.88    13.05.79    13.03.70    13.01.61
4   4   13.11.16    13.03.05    13.11.06    13.07.08    13.03.10    13.11.11    13.07.13    13.03.15    13.01.06    13.11.96    13.09.87    13.07.78    13.05.69    13.03.60
5   5   13.01.16    13.05.04    13.01.06    13.09.07    13.05.09    13.01.11    13.09.12    13.05.14    13.03.05    13.01.96    13.11.86    13.09.77    13.07.68    
6   6   13.03.15    13.07.03    13.03.05    13.11.06    13.07.08    13.03.10    13.11.11    13.09.07    13.05.04    13.01.01    13.09.97    13.05.94    13.01.91    
7   7   13.05.14    13.09.02    13.05.04    13.01.06    13.09.07    13.05.09    13.01.11    13.11.06    13.07.03                    
8   8   13.07.13    13.11.01    13.07.03    13.03.05    13.11.06    13.07.08    13.03.10    13.01.06    13.09.02                    
9   9   13.09.12    13.01.01    13.09.02    13.09.07    13.09.12    13.09.17    13.09.22    13.03.05    13.11.01                    
10  10  13.11.11    13.03.00    13.11.01    13.11.06    13.11.11    13.11.16    13.11.21    13.05.04    13.01.01                    
11  11  13.01.11    13.05.99    13.01.01    13.01.06    13.01.11    13.01.16    13.01.21    13.07.03    13.03.00                    
12  12  13.03.10    13.07.98    13.03.00    13.03.05    13.03.10    13.03.15    13.05.19    13.09.02    13.05.99                    
13  13  13.05.09    13.09.97    13.05.99    13.05.04    13.09.07    13.05.14    13.07.18    13.11.01                        
14  14  13.07.08    13.11.96    14.05.99    13.07.03    13.11.06    13.07.13    13.09.17    13.01.01                        
15  15  13.09.07    13.01.96    15.05.99    13.09.02    13.01.06    13.09.12    13.11.16    13.03.00                        
16  16  13.11.06    13.03.95    16.05.99    13.11.01    13.03.05    13.11.11    13.01.16    13.05.99                        
17  17  13.01.06    13.05.94    17.05.99    13.01.01    13.05.04    13.01.11    13.03.15                            
18  18  13.03.05    13.07.93    18.05.99    13.03.00    13.07.03    13.03.10    13.05.14                            
19  19  13.05.04    13.09.92    19.05.99    13.05.99    13.09.02    13.05.09    13.07.13                            
20  20  13.07.03    13.11.91    20.05.99    14.05.99    13.11.01    13.07.08    13.09.12                            
21  21  13.09.02    13.01.91    21.05.99    15.05.99    13.01.01    13.09.07    13.11.11                            
22  22  13.11.01    13.03.90    22.05.99    16.05.99    13.03.00    13.11.06    13.01.11                            
23  23  13.01.01    13.05.89    23.05.99    17.05.99    13.05.99    13.01.06    13.03.10                            
24  24  13.03.00    13.07.88    24.05.99    18.05.99    14.05.99    13.03.05                                
25  25  13.05.99    13.09.87    25.05.99    19.05.99    

Best wishes and thank you very much for your help, Caro

Here is the code to access the small data set at the top.

structure(list(...1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Patient_ID = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25), Date1 = structure(c(1557705600, 1531440000, 
1505260800, 1478995200, 1452643200, 1426204800, 1399939200, 1373673600, 
1347494400, 1321142400, 1294876800, 1268438400, 1242172800, 1215907200, 
1189641600, 1163376000, 1137110400, 1110672000, 1084406400, 1058054400, 
1031875200, 1005609600, 979344000, 952905600, 926553600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date2 = structure(c(1189641600, 1163376000, 
1137110400, 1110672000, 1084406400, 1058054400, 1031875200, 1005609600, 
979344000, 952905600, 926553600, 900288000, 874108800, 847843200, 
821491200, 795052800, 768787200, 742521600, 716342400, 689990400, 
663724800, 637286400, 611020800, 584755200, 558489600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date3 = structure(c(1242172800, 1215907200, 
1189641600, 1163376000, 1137110400, 1110672000, 1084406400, 1058054400, 
1031875200, 1005609600, 979344000, 952905600, 926553600, 926640000, 
926726400, 926812800, 926899200, 926985600, 927072000, 927158400, 
927244800, 927331200, 927417600, 927504000, 927590400), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date4 = structure(c(1294876800, 1268438400, 
1242172800, 1215907200, 1189641600, 1163376000, 1137110400, 1110672000, 
1189641600, 1163376000, 1137110400, 1110672000, 1084406400, 1058054400, 
1031875200, 1005609600, 979344000, 952905600, 926553600, 926640000, 
926726400, 926812800, 926899200, 926985600, 927072000), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date5 = structure(c(1347494400, 1321142400, 
1294876800, 1268438400, 1242172800, 1215907200, 1189641600, 1163376000, 
1347494400, 1321142400, 1294876800, 1268438400, 1189641600, 1163376000, 
1137110400, 1110672000, 1084406400, 1058054400, 1031875200, 1005609600, 
979344000, 952905600, 926553600, 926640000, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date6 = structure(c(1399939200, 1373673600, 
1347494400, 1321142400, 1294876800, 1268438400, 1242172800, 1215907200, 
1505260800, 1478995200, 1452643200, 1426204800, 1399939200, 1373673600, 
1347494400, 1321142400, 1294876800, 1268438400, 1242172800, 1215907200, 
1189641600, 1163376000, 1137110400, 1110672000, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date7 = structure(c(1452643200, 1426204800, 
1399939200, 1373673600, 1347494400, 1321142400, 1294876800, 1268438400, 
1663027200, 1636761600, 1610496000, 1557705600, 1531440000, 1505260800, 
1478995200, 1452643200, 1426204800, 1399939200, 1373673600, 1347494400, 
1321142400, 1294876800, 1268438400, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date8 = structure(c(1505260800, 1478995200, 
1452643200, 1426204800, 1399939200, 1189641600, 1163376000, 1137110400, 
1110672000, 1084406400, 1058054400, 1031875200, 1005609600, 979344000, 
952905600, 926553600, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date9 = structure(c(1505347200, 1189641600, 
1163376000, 1137110400, 1110672000, 1084406400, 1058054400, 1031875200, 
1005609600, 979344000, 952905600, 926553600, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Date10 = structure(c(1505433600, 900288000, 
874108800, 847843200, 821491200, 979344000, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date11 = structure(c(1505520000, 611020800, 
584755200, 558489600, 532224000, 874108800, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date12 = structure(c(1505606400, 321753600, 
295401600, 269136000, 242956800, 768787200, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date13 = structure(c(1505692800, 32572800, 
6134400, -20131200, -46396800, 663724800, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Date14 = structure(c(1505779200, -256694400, 
-282960000, -309398400, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(NA, -25L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Views: 331

Answers (2)

mattyx17
mattyx17

Reputation: 826

If you're not concerned with efficiency, you can use the outer function. Lets say your dataset is call d

cols = colnames(d)[-1]  #remove id column
resultNames = outer(cols, cols, paste, sep="_") #column names
result = outer(2:ncol(d), 2:ncol(d), function(x, y) d[,x] - d[,y]) #do pairwise differences, ignoring first column 
colnames(result) <- resultNames

Now result will have your pairwise differences. I'm not sure if you want to flag all occurrences or the first, but you can use the which function.

This isn't the most efficient however as the running time is O(n^2). A more efficient algorithm is using divide and conquer but you would need to code an implementation of this

Upvotes: 1

Ben
Ben

Reputation: 30559

Here is one way to approach this using tidyverse.

First, put dates into long format (tidier). Then put the dates in chronological order with arrange. Then remove the NA.

After grouping by Patient_ID, you can calculate differences between dates, and take the minimum difference. Your diff column should be minimum difference between dates.

Does this match the expected outcome?

library(tidyverse)

df %>%
  pivot_longer(cols = -c(...1, Patient_ID), names_to = "number", values_to = "date", names_pattern = "Date(\\d+)") %>%
  arrange(Patient_ID, date) %>%
  drop_na() %>%
  group_by(Patient_ID) %>%
  mutate(diff = c(NA, diff(date))) %>%
  slice_min(diff, with_ties = FALSE)

Output

# A tibble: 25 x 5
# Groups:   Patient_ID [25]
    ...1 Patient_ID number date                 diff
   <dbl>      <dbl> <chr>  <dttm>              <dbl>
 1     1          1 9      2017-09-14 00:00:00     1
 2     2          2 9      2007-09-13 00:00:00   304
 3     3          3 9      2006-11-13 00:00:00   304
 4     4          4 3      2006-11-13 00:00:00   304
 5     5          5 9      2005-03-13 00:00:00   304
 6     6          6 3      2005-03-13 00:00:00   304
 7     7          7 9      2003-07-13 00:00:00   303
 8     8          8 3      2003-07-13 00:00:00   303
 9     9          9 5      2012-09-13 00:00:00     0
10    10         10 5      2011-11-13 00:00:00     0
11    11         11 5      2011-01-13 00:00:00     0
12    12         12 5      2010-03-13 00:00:00     0
13    13         13 3      1999-05-13 00:00:00   607
14    14         14 1      2008-07-13 00:00:00   608
15    15         15 8      2000-03-13 00:00:00   303
16    16         16 3      1999-05-16 00:00:00     3
17    17         17 4      2001-01-13 00:00:00   607
18    18         18 4      2000-03-13 00:00:00   300
19    19         19 3      1999-05-19 00:00:00     6
20    20         20 3      1999-05-20 00:00:00     6
21    21         21 3      1999-05-21 00:00:00     6
22    22         22 3      1999-05-22 00:00:00     6
23    23         23 4      1999-05-17 00:00:00     4
24    24         24 4      1999-05-18 00:00:00     4
25    25         25 4      1999-05-19 00:00:00     6

Upvotes: 2

Related Questions