Bob
Bob

Reputation: 1

Mapping multiple IDs using R

The idea is as follows. Every patient has a unique patient id, which we call hidenic_id. However this patient may be admitted to the hospital multiple times. On the other hand every entry has unique emtek_id.

Patient 110380 was admitted to the hospital 4/14/2001 11:08 and then transferred through the hospital and discharged on 4/24/2001 18:16. Now this patient again admitted on 5/11/2001 23:24 because he has different emtek_id now. He is discharged from the hospital on 5/25/2001 16:26. So you need to assign correct emtek_ids by checking the dates. If the date in the combined file is within the admission and discharge time period (or very close like 24 hours) we can assign that emtek_id.

How can I assign different emtek_IDs to entries with hidenic_id and admit time?

Upvotes: 0

Views: 599

Answers (2)

ctde
ctde

Reputation: 91

I had a couple ideas worth sharing.

First, make emtek_id from hidenic_id and date. Second, make the emtek_id logical for parsing, e.g., emtek_id@dataTime. Third, make the database a global vector. Depending on memory limits, there has to be a faster way than this, but it might give you a few ideas.

The main problems are handling NA values and incorrect hidenic_id, validating hidenic_id(s), and padding the IDs if you don't characters leading (which would be a quick fix). Lastly, how do you want to handle input that's incorrect but not NA/null? For instance, say you input "ID" instead of "ID12345", do you want to treat that as a call to assign a new value or prompt for a correct input XOR NA value? I will assume you only feed it correct ID inputs or NA values, but this is my trivializing assumption.

Here's some pseudo-code to start the idea. You choose how to store the data (eg. csv file then use data.table::fread()):

#this file's name is "make.hidenic_id.R"
library(data.table)
library(stringr)
set.seed(101)
#one might one a backup written, perhaps conditionally updating it every hour or so.
database.hidenic_id <<-data.table::fread("database.filename.hidenic_id.csv")
database.emtek_id   <<-data.table::fread("database.filename.emtek_id.csv") 

make.hidenic_Id = function(in.hidenic_id){
            if(is.na(in.hidenic_id) | !(in.hidenic_id %in% database.hidenic_id)){
                new.hidenic_id=NA
                #conditionally make new hidenic_id
                while( new.hidenic_id %in% database.hidenic_id){
                    new.hidenic_id = paste0("ID",str_pad(sample.int(99999, 1),5,pad=0))
                }
                #make new emtek_id
                new.emtek_id <- paste0(new.hidenic_id,  sep="@",  str_sub(Sys.time(),1,16))
                
                #update databases; e.g., c(database.emtek_id, new.emtek_id)
                database.hidenic_id <<- c(database.hidenic_id, new.hidenic_id)
                database.emtek_id   <<- c(database.emtek_id,   new.emtek_id)
            }else{
                new.emtek_id <- paste0(in.hidenic_id,  sep="@",  str_sub(Sys.time(),1,16))
              # update database.emtek_id 
              database.emtek_id   <<- c(database.emtek_id,   new.emtek_id)  
            }
            return(new.emtek_id)
}
temp = readline(prompt="Enter hidenic_id OR type \"NA\": ")
data.table::fwrite(database.emtek_id,  "database.filename.emtek_id.csv") 
data.table::fwrite(database.hidenic_id,"database.filename.hidenic_id.csv") 

and call the file with

source("make.hidenic_id.R") 

There are a lot of "good-practice" things I don't do to manage poor input data or optimizing searching, but this is a strong start. Some other good-practice would be to have longer integers or a different leading string, but you never said we could use input value to make the IDs.

You could say this was inspired by the census since everything is just one massive string per geographic ID variable.

Upvotes: 1

cylondude
cylondude

Reputation: 1888

I was intrested in your problem so I created some mock data and tried to solve the problem but I ran into some confusion myself and then posted my question, which I think is what you are asking but more general. You can see the response here: How can I tell if a time point exists between a set of before and after times

My post generates what I believe is what you are starting with and the checked answer is what I believe you are looking for. The full code is below. You will need to install zoo and IRanges. Also, I did this in version 2.15.3. IRanges did not install properly in 3.0.0.

## package installation
source("http://bioconductor.org/biocLite.R")
  biocLite("IRanges")
install.packages("zoo")


## generate the emtek and hidenic file data
library(zoo)
date_string <- paste("2001", sample(12, 10, 3), sample(28,10), sep = "-")
time_string <- c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26")

entry_emtek <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
entry_emtek <- entry_emtek[order(entry_emtek)]
exit_emtek <- entry_emtek + 3600 * 24
emtek_file <- data.frame(emtek_id = 1:10, entry_emtek, exit_emtek)

hidenic_id <- 110380:110479
date_string <- paste("2001", sample(12, 100, replace = TRUE), sample(28,100, replace = T), sep = "-")
time_string <- rep(c("23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26",
                 "23:03:20", "22:29:56", "01:03:30", "18:21:03", "16:56:26"),10)
hidenic_time <- strptime(paste(date_string, time_string), "%Y-%m-%d %H:%M:%S")
hidenic_time <- hidenic_time[order(hidenic_time)]
hidenic_file <- data.frame(hidenic_id, hidenic_time)

## Find the intersection of emtek and hidenic times.  This part was done by user: agstudy
library(IRanges)
## create a time intervals 
subject <- IRanges(as.numeric(emtek_file$entry_emtek),
        as.numeric(emtek_file$exit_emtek))
## create a time intervals (start=end here)
query <- IRanges(as.numeric(hidenic_file$hidenic_time),
        as.numeric(hidenic_file$hidenic_time))
## find overlaps and extract rows (both time point and intervals)  
emt.ids <- subjectHits(findOverlaps(query,subject))
hid.ids <- queryHits(findOverlaps(query,subject))
cbind(hidenic_file[hid.ids,],emtek_file[emt.ids,])

Upvotes: 0

Related Questions