James Irwin
James Irwin

Reputation: 123

Vectorize calculation across relational dataframes in R

Is it possible in R to vectorize a calculation on data in a dataframe, where one criteria on which the calculation is performed comes from an external dataframe? This can be performed using a for loop, however it is slow.

The full task involves asking questions of 15 years of medical laboratory data in relational format. For example, what is the lowest haemoglobin level recorded for a patient in the three months following a surgical procedure? This from two tables: one with dates of surgery (~ 6000, often multiple per patient) and one of dated haemoglobin levels (~200,000, multiple per patient). A loop as below takes ~30 minutes per query.

In this MWE data is in two tables and is linked by an index.

##create two dataframes
a<-c("ID1","ID2","ID3","ID2","ID1")
b<-c(1,2,3,4,5)
c<-as.Date(c("2005-01-01","2002-01-01","2003-01-01","2004-01-01","2001-01-01"))
df.1<-cbind.data.frame(a,b,c,stringsAsFactors=FALSE)

d<-c("ID1","ID2","ID1")
e<-as.Date(c("2002-02-01","2001-02-01","2000-01-01"))
df.2<-cbind.data.frame(d,e,stringsAsFactors=FALSE)        

>df.1
   a  b     c
1 ID1 1 2005-01-01
2 ID2 2 2002-01-01
3 ID3 3 2003-01-01
4 ID2 4 2004-01-01
5 ID1 5 2001-01-01
>df.2
   d       e
1 ID1 2002-02-01
2 ID2 2001-02-01
3 ID1 2000-01-01


out<-rep(NA,length(df.2$d))
for(i in 1:length(df.2$d)){
out[i]<-max(df.1$b[df.1$a==df.2$d[i] & df.1$c>df.2$e[i]])
}


> cbind(df.2,out)
   d      e       out
1 ID1 2002-02-01   1
2 ID2 2001-02-01   4
3 ID1 2000-01-01   5

Upvotes: 1

Views: 73

Answers (1)

rawr
rawr

Reputation: 20811

To answer your question, you can vectorize a calculation in r with Vectorize.

However, I'm not sure what "slow" means here. And there are probably better ways to accomplish your task, but I would rather read a word problem than code.

##create two dataframes
a<-c("ID1","ID2","ID3","ID2","ID1")
b<-c(1,2,3,4,5)
c<-as.Date(c("2005-01-01","2002-01-01","2003-01-01","2004-01-01","2001-01-01"))
df.1<-cbind.data.frame(a,b,c,stringsAsFactors=FALSE)

d<-c("ID1","ID2","ID1")
e<-as.Date(c("2002-02-01","2001-02-01","2000-01-01"))
df.2<-cbind.data.frame(d,e,stringsAsFactors=FALSE)        

f <- function(i)
  ## your code here
  max(df.1$b[df.1$a==df.2$d[i] & df.1$c>df.2$e[i]])
vf <- Vectorize(f)

vf(1:3)
# [1] 1 4 5

Upvotes: 1

Related Questions