Reputation: 83
I have an into-the-weeds question and I haven't been able to find a question similar enough to figure it out on my own. From previous experience there is always a solution but I just haven't found it after a lot of searching through for loop questions and potentially purrr map questions.
I do health data research. I have a master list of unique ICD codes used to classify patient diagnoses from start to finish--there are thousands of codes. I want to add onto to that data frame variables that define specific conditions that I'm isolating from data sets. For example, I have every diagnosis under the sun in the master data frame and I have a set definition of the term diabetes. I want to be able to loop through multiple set term definitions (diabetes, heart failure, kidney disease, etc.) stored as values and then create a new column in my master list data frame with 1's for part of the definition and 0's for not part of the definition. The issue is I have hundreds of defined conditions and thousands of potential codes, which means I certainly need to do this programmatically (particularly if I need to adjust/add a term definition and re-define).
So far, I have tried below:
library(dplyr)
terms <- c('acute_kidney_failure', 'acute_limb_ischemia')
for(i in terms){
definitions_master.df <- ontology_master.df %>%
mutate(i = if_else(ONTOLOGY_CODE %in% i, 1, 0))
}
I've kept it simple here on purpose but there are many, many terms. These terms have stored values such as acute_kidney_failure <- c(1,2,3), acute_limb_ischemia <- c(4,5,6). Ideally the new column has the same name as the term in the data frame since I'll want to be able to identify the diagnosis and I don't want to go through the hassle of renaming hundreds of terms if it can be helped.
I've read many accounts that hate on for loops and I'm by no means tied to that but it's the only thing I can think to do. In the end I want to be able to reference this master definitions table to create variables in specific data sets and I will use it over and over. I'm in it for the long haul so I want to do this programmatically, elegantly, and reproducibly. I am willing to learn any new packages or functions necessary but I like staying in the tidyverse if I can because it's less likely to hard break my code. Can anyone help?
EDIT
Minimally reproducible example below. So sorry it took a while I had to finish the semester.
Medical condition definitions as follows -- I have these...and many of them.
library(dplyr)
acute_kidney_failure <- c('1', '3', '5')
acute_limb_ischemia <- c('2', '6', '8')
The overall master data frame looks like the one below. The identifying codes are the master list of all the codes identifying medical conditions such as the 1, 3, 5; 2, 6, 8 above.
definitions.df <- data.frame(identifying_code = c(1:10))
Instead of the variable creation below, I'd like some sort of loop or mapping function to scan the definitions, look at the "identifying code" variable, and then create a new column resulting in the following data frame. I particularly want it to be programmatic because we are adding new definitions all the time and I'd like to be able to add a new one, run it again, and have the new column generated without too much of a hassle. It won't matter if I have to create the entire definitions data frame over again since it will just be a function. This is a simple example but at scale it'd be impossible to do by hand.
definitions.df <- definitions.df %>%
mutate(acute_kidney_failure = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0)) %>%
mutate(acute_limb_ischemia = c(0, 1, 0, 0, 0, 1, 0, 1, 0, 0))
I've tried the loop I mentioned previously without ability to get it to run. Anyone have thoughts?
Upvotes: 0
Views: 337
Reputation: 30474
Assuming the terms are the names of the vectors you are using, you could do the following.
As you were pursuing, you could use %in%
to see if diagnosis contained in the given vector. By using get
you are accessing the object (in this case a vector) with the same name as the character string in terms
.
This will return TRUE
or FALSE
on whether contained in vector or not. To convert to 1 or 0, add the plus +
sign.
The result is assigned to a new column with the same name as character strings in terms
.
for (i in terms) {
definitions.df[[i]] <- (+(definitions.df$identifying_code %in% get(i)))
}
There may be alternative ways to consider organizing your diagnosis and data. Let me know of any questions or if I can help further.
Output
identifying_code acute_kidney_failure acute_limb_ischemia
1 1 1 0
2 2 0 1
3 3 1 0
4 4 0 0
5 5 1 0
6 6 0 1
7 7 0 0
8 8 0 1
9 9 0 0
10 10 0 0
Upvotes: 1