user8005527
user8005527

Reputation: 11

R-How do I apply a for loop across multiple data frames?

In R, I have 3 dataframes that are similar to the sample versions I have provided below. The first Data is the primary data set, the TW and UW dataframes have a similar variable as Data (MN-mapping_for_N), and then 1000 different values for each variable (N48) etc. I provided 3 for my purposes here.

Data<-matrix(c(4720,44.29,"Work or Private Clinic",N48,2659,55.05,"Hospital",N1,1612,59.99,"No Care",N48),ncol = 4,byrow=TRUE)
colnames(Data)<-c("studyid", "Pred_ex", "wherecare", "MN-mapping_for_N")
Data<-data.frame(Data)


TW<-matrix(c("N48",0.07,0.08,0.09,"N1",0.10,0.11,0.12,"N2",0.02,0.03,0.04,"N3",0.04,0.05,0.06),ncol = 4, byrow = TRUE)
colnames(TW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
TW<-data.frame(TW)

    UW<-matrix(c("N48",0.71,0.81,0.91,"N1",0.11,0.111,0.131,"N2",0.021,0.031,0.041,"N3",0.041,0.051,0.061),ncol = 4, byrow = TRUE)
colnames(UW)<-c("MN-mapping_for_N","draw1","draw2","draw3")`
UW<-data.frame(UW)

My goal is to create a new column with values from a randomly selected column from the UT and TW data, the correct one to draw from is predicated on the value in Data$wherecare

I have been using a mix of dplyr and the match function combined with a couple of functions of my own creation. Currently this looks like

drawselect<-function(x) {
samplepick<-sample(2:1001,1)
select(x,1,num_range("draw",samplepick))
 }

DALY_FX_LT_NR<-function(x){
 draw_T_DW<-drawselect(TW)
  draw_UT_DW<-drawselect(UW)
  drawnames.TW<-colnames((draw_T_DW))
  drawnames.UT<-colnames(draw_UT_DW)
  UT.draw<-drawnames.UT[2]
  T.draw<-drawnames.T[2]
  print(UT.draw)       
  print(T.draw)
  newdf<-x %>% mutate(DW=NA)
  for(i in 1:nrow(newdf)){
if(newdf$wherecare[i]!= "No Care"){
  newdf$DW=draw_T_DW[,2][match(newdf$`MN-mapping_for_N`,draw_T_DW$`MN-mapping_for_N`)]
  next
}else if(newdf$wherecare[i]=="No Care"){
  newdf$DW=draw_UT_DW[,2][match(newdf$`MN-mapping_for_N`,draw_UT_DW$`MN-mapping_for_N`[i])]
}
 }
newdf
}

The code runs, but I can't seem to get it to actually iterate row by row to make it pull from the correct dataframe for draw values (i.e. UT or TW after going through the drawselect function).

So what I get looks like:

-------------------------------------------------------------


studyid   Pred_ex        wherecare         MN-mapping_for_N     DW
--------- --------- ---------------------- ------------------ ------
  4720      44.29   Work or Private Clinic        N48          0.08

  2659      55.05          Hospital                N1          0.11

  1612      59.99          No Care                N48          0.08
--------------------------------------------------------------------

When I should be getting:

     studyid   Pred_ex        wherecare         MN-mapping_for_N    DW
    --------- --------- ---------------------- ------------------ ------
      4720      44.29   Work or Private Clinic        N48          0.08

      2659      55.05          Hospital                N1          0.11

      1612      59.99          No Care                N48          0.81
    --------------------------------------------------------------------

The key difference being the 0.81 in the lower right corner, not a big deal in the sample data, but the actual data is several hundred rows long, so I would like to have the function "decide correctly" which dataset to pull from. This value could be 0.71,0.81 or 0.91, any of the UT values for N48 will work.

The ultimate goal will be to use that value in a calculation by multiplying by the Pred_ex column, which I can do, then rerun this function many times to bootstrap the data, but until I can get these if statements to work correctly I can't do that. I have also tried using dplyr::left_join to match these and had similar problems with the conditional statements not working. I think the match function as its written will work better, but I'm certainly open to anything.

Any help is greatly appreciated.

Also, thanks to everyone on stack overflow in general, reading your answers to other questions is the main reason I have gotten this far.

Upvotes: 0

Views: 962

Answers (1)

Yannis Vassiliadis
Yannis Vassiliadis

Reputation: 1709

So you don't need a new function (I kept drawselect, you can just do the following:

for (i in 1:nrow(Data)){
    if (Data$wherecare[i] != "No Care"){
        Data$DW[i]<- drawselect(TW)[which(drawselect(TW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
    } else {
        Data$DW[i]<- drawselect(UW)[which(drawselect(UW)$MN.mapping_for_N == as.character(Data$MN.mapping_for_N[i])), 2]
    }
}

> Data
  studyid Pred_ex              wherecare MN.mapping_for_N   DW
1    4720   44.29 Work or Private Clinic              N48 0.08
2    2659   55.05               Hospital               N1 0.11
3    1612   59.99                No Care              N48 0.81

If you want to wrap everything in a function (including drawselect), try something along the lines of the following:

    DALY_FX_LT_NR<-function(x, y, z){ #x would be Data, y would be TW, z would be UW
 samplepick<-sample(2:(ncol(y)-1),1) 
 for (i in 1:nrow(x)){
    if (x$wherecare[i] != "No Care"){
        x$DW[i]<- y[which(y$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
    } else {
        x$DW[i]<- z[which(z$MN.mapping_for_N==as.character(x$MN.mapping_for_N[i])), paste0("draw", samplepick)]
    }
  }
  return(x)
}

> DALY_FX_LT_NR(x = Data, y = TW, z = UW)
  studyid Pred_ex              wherecare MN.mapping_for_N   DW
1    4720   44.29 Work or Private Clinic              N48 0.09
2    2659   55.05               Hospital               N1 0.12
3    1612   59.99                No Care              N48 0.91

Upvotes: 1

Related Questions