Reputation: 11
In R, I have 3 dataframes that are similar to the sample versions I have provided below. The first Data
is the primary data set, the TW
and UW
dataframes have a similar variable as Data
(MN-mapping_for_N
), and then 1000 different values for each variable (N48
) etc. I provided 3 for my purposes here.
Data<-matrix(c(4720,44.29,"Work or Private Clinic",N48,2659,55.05,"Hospital",N1,1612,59.99,"No Care",N48),ncol = 4,byrow=TRUE)
colnames(Data)<-c("studyid", "Pred_ex", "wherecare", "MN-mapping_for_N")
Data<-data.frame(Data)
TW<-matrix(c("N48",0.07,0.08,0.09,"N1",0.10,0.11,0.12,"N2",0.02,0.03,0.04,"N3",0.04,0.05,0.06),ncol = 4, byrow = TRUE)
colnames(TW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
TW<-data.frame(TW)
UW<-matrix(c("N48",0.71,0.81,0.91,"N1",0.11,0.111,0.131,"N2",0.021,0.031,0.041,"N3",0.041,0.051,0.061),ncol = 4, byrow = TRUE)
colnames(UW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
UW<-data.frame(UW)
My goal is to create a new column with values from a randomly selected column from the UT
and TW
data, the correct one to draw from is predicated on the value in Data$wherecare
I have been using a mix of dplyr and the match function combined with a couple of functions of my own creation. Currently this looks like
drawselect<-function(x) {
samplepick<-sample(2:1001,1)
select(x,1,num_range("draw",samplepick))
}
DALY_FX_LT_NR<-function(x){
draw_T_DW<-drawselect(TW)
draw_UT_DW<-drawselect(UW)
drawnames.TW<-colnames((draw_T_DW))
drawnames.UT<-colnames(draw_UT_DW)
UT.draw<-drawnames.UT[2]
T.draw<-drawnames.T[2]
print(UT.draw)
print(T.draw)
newdf<-x %>% mutate(DW=NA)
for(i in 1:nrow(newdf)){
if(newdf$wherecare[i]!= "No Care"){
newdf$DW=draw_T_DW[,2][match(newdf$`MN-mapping_for_N`,draw_T_DW$`MN-mapping_for_N`)]
next
}else if(newdf$wherecare[i]=="No Care"){
newdf$DW=draw_UT_DW[,2][match(newdf$`MN-mapping_for_N`,draw_UT_DW$`MN-mapping_for_N`[i])]
}
}
newdf
}
The code runs, but I can't seem to get it to actually iterate row by row to make it pull from the correct dataframe for draw values (i.e. UT
or TW
after going through the drawselect
function).
So what I get looks like:
-------------------------------------------------------------
studyid Pred_ex wherecare MN-mapping_for_N DW
--------- --------- ---------------------- ------------------ ------
4720 44.29 Work or Private Clinic N48 0.08
2659 55.05 Hospital N1 0.11
1612 59.99 No Care N48 0.08
--------------------------------------------------------------------
When I should be getting:
studyid Pred_ex wherecare MN-mapping_for_N DW
--------- --------- ---------------------- ------------------ ------
4720 44.29 Work or Private Clinic N48 0.08
2659 55.05 Hospital N1 0.11
1612 59.99 No Care N48 0.81
--------------------------------------------------------------------
The key difference being the 0.81 in the lower right corner, not a big deal in the sample data, but the actual data is several hundred rows long, so I would like to have the function "decide correctly" which dataset to pull from. This value could be 0.71,0.81 or 0.91, any of the UT
values for N48
will work.
The ultimate goal will be to use that value in a calculation by multiplying by the Pred_ex
column, which I can do, then rerun this function many times to bootstrap the data, but until I can get these if
statements to work correctly I can't do that. I have also tried using dplyr::left_join
to match these and had similar problems with the conditional statements not working. I think the match
function as its written will work better, but I'm certainly open to anything.
Any help is greatly appreciated.
Also, thanks to everyone on stack overflow in general, reading your answers to other questions is the main reason I have gotten this far.
Upvotes: 0
Views: 962
Reputation: 1709
So you don't need a new function (I kept drawselect
, you can just do the following:
for (i in 1:nrow(Data)){
if (Data$wherecare[i] != "No Care"){
Data$DW[i]<- drawselect(TW)[which(drawselect(TW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
} else {
Data$DW[i]<- drawselect(UW)[which(drawselect(UW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
}
}
> Data
studyid Pred_ex wherecare MN.mapping_for_N DW
1 4720 44.29 Work or Private Clinic N48 0.08
2 2659 55.05 Hospital N1 0.11
3 1612 59.99 No Care N48 0.81
If you want to wrap everything in a function (including drawselect
), try something along the lines of the following:
DALY_FX_LT_NR<-function(x, y, z){ #x would be Data, y would be TW, z would be UW
samplepick<-sample(2:(ncol(y)-1),1)
for (i in 1:nrow(x)){
if (x$wherecare[i] != "No Care"){
x$DW[i]<- y[which(y$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
} else {
x$DW[i]<- z[which(z$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
}
}
return(x)
}
> DALY_FX_LT_NR(x = Data, y = TW, z = UW)
studyid Pred_ex wherecare MN.mapping_for_N DW
1 4720 44.29 Work or Private Clinic N48 0.09
2 2659 55.05 Hospital N1 0.12
3 1612 59.99 No Care N48 0.91
Upvotes: 1